FIT3181: Deep Learning (2024)¶


Lecturer (Malaysia): Dr Arghya Pal | arghya.pal@monash.edu
Lecturer (Malaysia): Dr Lim Chern Hong | lim.chernhong@monash.edu

CE/Lecturer (Clayton): Dr Trung Le | trunglm@monash.edu
Lecturer (Clayton): Prof Dinh Phung | dinh.phung@monash.edu



School of Information Technology, Monash University, Malaysia
Faculty of Information Technology, Monash University, Australia


Student Information¶


Surname: Wee
Firstname: Brandon
Student ID: 33561826
Email: bwee0004@student.monash.edu
Your tutorial time: Friday, 2pm - 4pm


[Very Important]
Make a copy of thus Google colab notebook including the traces and progresses of model training before submitting.**

Deep Neural Networks¶

Due: 11:55pm Wednesday, 11 September 2024 (Wednesday)¶

Important note: This is an individual assignment. It contributes 25% to your final mark. Read the assignment instructions carefully.¶

What to submit¶

This assignment is to be completed individually and submitted to Moodle unit site. By the due date, you are required to submit one single zip file, named xxx_assignment01_solution.zip where xxx is your student ID, to the corresponding Assignment (Dropbox) in Moodle. You can use Google Colab to do Assigmnent 1 but you need to save it to an *.ipynb file to submit to the unit Moodle.

More importantly, if you use Google Colab to do this assignment, you need to first make a copy of this notebook on your Google drive.

For example, if your student ID is 12356, then gather all of your assignment solution to folder, create a zip file named 123456_assignment01_solution.zip and submit this file.

Within this zip folder, you must submit the following files:

  1. Assignment01_solution.ipynb: this is your Python notebook solution source file.
  2. Assignment01_output.html: this is the output of your Python notebook solution exported in html format.
  3. Any extra files or folder needed to complete your assignment (e.g., images used in your answers).

Since the notebook is quite big to load and work together, one recommended option is to split solution into three parts and work on them seperately. In that case, replace Assignment01_solution.ipynb by three notebooks: Assignment01_Part1_solution.ipynb, Assignment01_Part2_solution.ipynb and Assignment01_Part3_solution.ipynb

Part 1: Theory and Knowledge Questions¶

[Total marks for this part: 30 points]

The first part of this assignment is to demonstrate your knowledge in deep learning that you have acquired from the lectures and tutorials materials. Most of the contents in this assignment are drawn from the lectures and tutorials from weeks 1 to 4. Going through these materials before attempting this part is highly recommended.

Question 1.1 Activation function plays an important role in modern Deep NNs. For each of the activation functions below, state its output range, find its derivative (show your steps), and plot the activation fuction and its derivative¶

(a) Exponential linear unit (ELU): $\text{ELU}(x)=\begin{cases} 0.1\left(\exp(x)-1\right) & \text{if}\,x\leq0\\ x & \text{if}\,x>0 \end{cases}$

[1.5 points]

(b) Gaussian Error Linear Unit (GELU): $\text{GELU}(x)=x\Phi(x)$ where $\Phi(x)$ is the probability cummulative function of the standard Gaussian distribution or $\Phi(x) = \mathbb{P}\left(X\leq x\right)$ where $X \sim N\left(0,1\right)$. In addition, the GELU activation fuction (the link for the main paper) has been widely used in the state-of-the-art Vision for Transformers (e.g., here is the link for the main ViT paper).

[1.5 points]

$\text{(ANSWER FOR QUESTION 1.1)}$

(a) To find the derivative of $\text{ELU}(x)$, we can differentiatie each piece of its piece-wise definition.
When $x \leq0\ $:

Let $y = 0.1(\exp(x)-1)$, then taking the derivative gives $ \frac{\text{d}y}{\text{d}x} = 0.1\exp(x)$.

When $x>0$:
Let $z=x$, then taking the derivative gives $\frac{\text{d}z}{\text{d}x}=1$.

Therefore, the $\text{ELU}'(x)=\begin{cases} 0.1\left(\exp(x)\right) & \text{if}\,x\leq0\\ 1 & \text{if}\,x>0 \end{cases}$


(b) To find the derivative of $\text{GELU}(x)$, we can use the product rule.

Hence, $\text{GELU}'(x) = x\Phi'(x) + \Phi(x)$, where $\Phi'(x) = \frac{\text{d}}{\text{d}x}(\Phi(x))$

In [ ]:
"""
Below shows a code using numPy and Matplotlib that provides a plot of the
ELU function and its derivative.
"""

import numpy as np
import matplotlib.pyplot as plt

def elu(x):
  y = np.where(x <= 0, 0.1 * (np.exp(x) - 1), x)
  return y

def elu_derivative(x):
  y = np.where(x <= 0, 0.1 * np.exp(x), 1)
  return y

x = np.linspace(-5, 5, 100)

y = elu(x)
y_derivative = elu_derivative(x)

plt.figure(figsize=(8, 6))
plt.plot(x, y, label='ELU')

plt.plot(x, y_derivative, label='ELU Derivative')

plt.xlabel('x')
plt.ylabel('y')
plt.title('ELU Activation Function and its Derivative')
plt.legend()
plt.grid(True)
plt.show()
No description has been provided for this image
In [ ]:
"""
Below shows a code using numPy, Matplotlib and scipy that provides a plot of the
GELU function and its derivative.

the Scipy package is imported so that we can use the probability density function
and cumulative distribution function for the gaussian/normal distribution.
"""

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

def gelu(x):
  y = x * norm.cdf(x)
  return y

def gelu_derivative(x):
  y = x * norm.pdf(x) + norm.cdf(x)
  return y

x = np.linspace(-5, 5, 100)

y = gelu(x)
y_derivative = gelu_derivative(x)

plt.figure(figsize=(8, 6))
plt.plot(x, y, label='GELU')

plt.plot(x, y_derivative, label='GELU Derivative')

plt.xlabel('x')
plt.ylabel('y')
plt.title('GELU Activation Function and its Derivative')
plt.legend()
plt.grid(True)
plt.show()
No description has been provided for this image

Question 1.2: Assume that we feed a data point $x$ with a ground-truth label $y=2$ to the feed-forward neural network with the ReLU activation function as shown in the following figure¶

image.png

(a) What is the numerical value of the latent presentation $h^1(x)$?

[1 point]

(b) What is the numerical value of the latent presentation $h^2(x)$?

[1 point]

(c) What is the numerical value of the logit $h^3(x)$?

[1 point]

(d) What is the corresonding prediction probabilities $p(x)$?

[1 point]

(e) What is the predicted label $\widehat{y}$? Is it a correct and an incorect prediction? Remind that $y=2$.

[1 point]

(f) What is the cross-entropy loss caused by the feed-forward neural network at $(x,y)$? Remind that $y=2$.

[1 point]

(g) Why is the cross-entropy loss caused by the feed-forward neural network at $(x,y)$ (i.e., $\text{CE}(1_y, p(x))$) always non-negative? When does this $\text{CE}(1_y, p(x))$ loss get the value $0$? Note that you need to answer this question for a general pair $(x,y)$ and a general feed-forward neural network with, for example $M=4$ classes?

[1 point]

You must show both formulas and numerical results for earning full mark. Although it is optional, it is great if you show your PyTorch code for your computation.

$\text{(ANSWER FOR QUESTION 1.2)}$

(a) For this question, we can use the formula $x_{n+1}=\sigma(x_{n}W^{n+1}+b^{n+1})$, where $h^n(x) = x_n$. For this question, $\sigma(x) = \text{ReLU}(x) = \max(0, x_{i,j})$.

To find the latent presentation of $h^1(x)$, we can just substitute $n=0$ into the formula. Therefore, $h^1(x) = x_1 =\text{ReLU}(x_{0}W^{1}+b^{1})$.

This gives us the following matrix multiplication:
$h^1(x) = \text{ReLU}\Biggl(\begin{bmatrix} 1.2 & -1 & 2 \end{bmatrix}\begin{bmatrix} 1 & -1 & 1 & -1 \\ -1 & 1 & -1 & 1 \\ 2 & 2 & -2 & -2 \end{bmatrix} + \begin{bmatrix} 0 & 1 & 0 & 1 \end{bmatrix}\Biggr)$
$=\text{ReLU}(\begin{bmatrix} 6.2 & 2.8 & -1.8 & -5.2 \end{bmatrix})$
$=\begin{bmatrix} 6.2 & 2.8 & 0 & 0 \end{bmatrix}$.

(b) To find the latent presentation of $h^2(x)$, we can just substitute $n=1$ into the formula. Therefore, $h^2(x) = x_2 =\text{ReLU}(x_{1}W^{2}+b^{2})$.

This gives us the following matrix multiplication:
$h^2(x) = \text{ReLU}\Biggl(\begin{bmatrix} 6.2 & 2.8 & 0 & 0 \end{bmatrix}\begin{bmatrix} -1 & 1 & -1 \\ 1 & 1 & -1 \\ -1 & 1 & 1 \\ -1 & -1 & 2\end{bmatrix} + \begin{bmatrix} 1 & 0 & 1 \end{bmatrix}\Biggr)$
$=\text{ReLU}(\begin{bmatrix} -2.4 & 9 & -8 \end{bmatrix})$
$=\begin{bmatrix} 0 & 9 & 0 \end{bmatrix}$.

(c) To find the logit of $h^3(x)$, we can just substitute $n=2$ into the formula and not apply the softmax activation function onto the matrix multiplication results. Therefore, $h_{\text{logit}}^3(x) = x_{2}W^{3}+b^{3}$.

This gives us the following matrix multiplication:
$h_{\text{logit}}^3(x) = \begin{bmatrix} 0 & 9 & 0 \end{bmatrix} \begin{bmatrix} 2 & -2 \\ -2 & 2 \\ 2 & 2 \end{bmatrix} + \begin{bmatrix} 1 & 1.5 \end{bmatrix}$
$=\begin{bmatrix} -17 & 19.5 \end{bmatrix}$.

(d) To compute the probabilities $p(x)$, we can use the following formula to compute the softmax activation function.

$p(x) = \text{softmax}(x) = \Biggl[\frac{\exp(h_m)}{∑(\exp(h_i))} | m \in [1, n]\Biggr]$.
Therefore, $p(x) = \text{softmax}(\begin{bmatrix} -17 & 19.5 \end{bmatrix})$
$= \begin{bmatrix} \frac{\exp(-17)}{\exp(-17) + \exp(19.5)} & \frac{\exp(19.5)}{\exp(-17) + \exp(19.5)} \end{bmatrix}$
$=\begin{bmatrix} 1.4×10^{-16} & 1.00×10^{0} \end{bmatrix}$.

(e) The predicted label, $\widehat{y}$, is the value of $m$ such that, $\text{argmax}(p_m)$. In this case, $p(x) = \begin{bmatrix} 1.4×10^{-16} & 1.00×10^{0} \end{bmatrix} ⟹ p_2 > p_1$. Hence $\widehat{y} = 2$ is the predicted label and hence it is a correct prediction.

(f) To compute the Cross-entropy loss, we can take the negative logarithm of the predicted probability. Hence, $\text{CE}(y, p_2) = -\log(1.00) = 0$.

(g) If a neural network has $M = 4$ classes, the sum of the probabilities of the $M$ classes should equal $1$. Therefore, for any pair $(x, y)$, the probability $x$ will have the range $0 \leq x \leq 1$. Therefore, $\log x \leq 0$ for this domain of x. Hence, $\log x \leq 0 ⟹ -\log x \geq 0 ⟹ \text{CE}(1_y, x) \geq 0$, which means that the cross-entropy loss is always non-negative. The only instance for which $\text{CE}(1_y, x) = 0$, is when $x = 1$, which is when one class has a probaility is 1, and the rest of the $M-1$ classes has a probability of 0.

In [ ]:
import torch

# Defined function for feedforward process
def next_layer(sigma, x, w, b):
  h = torch.matmul(x, w) + b
  print(f"Logit = {h}")
  if sigma == 'ReLU':
    return torch.relu(h)
  elif sigma == 'softmax':
    return torch.softmax(h, dim=1)


x0 = torch.tensor([[1.2, -1, 2]], dtype=torch.float32)
W1 = torch.tensor([
                  [1, -1, 1, -1],
                  [-1, 1, -1, 1],
                  [2, 2, -2, -2]
                  ],
                  dtype=torch.float32)
b1 = torch.tensor([[0, 1, 0, 1]], dtype=torch.float32)
x1 = next_layer("ReLU", x0, W1, b1)
print(f"x1 = {x1}\n")

W2 = torch.tensor([[-1, 1, -1], [1, 1, -1], [-1, 1, 1], [-1, -1, 2]], dtype=torch.float32)
b2 = torch.tensor([[1, 0, 1]], dtype=torch.float32)
x2 = next_layer("ReLU", x1, W2, b2)
print(f"x2 = {x2}\n")

W3 = torch.tensor([[2, -2], [-2, 2], [2, 2]], dtype=torch.float32)
b3 = torch.tensor([[1, 1.5]], dtype=torch.float32)
x3 = next_layer("softmax", x2, W3, b3)
print(f"x3 = {x3}")
Logit = tensor([[ 6.2000,  2.8000, -1.8000, -5.2000]])
x1 = tensor([[6.2000, 2.8000, 0.0000, 0.0000]])

Logit = tensor([[-2.4000,  9.0000, -8.0000]])
x2 = tensor([[0., 9., 0.]])

Logit = tensor([[-17.0000,  19.5000]])
x3 = tensor([[1.4069e-16, 1.0000e+00]])

Question 1.3:¶

For Question 1.3, you have two options:

  • (1) perform the forward, backward propagation, and SGD update for one mini-batch (10 points), or
  • (2) manually implement a feed-forward neural network that can work on real tabular datasets (20 points).

You can choose either (1) or (2) to proceed.

Option 1¶

[Total marks for this option: 10 points]

Assume that we are constructing a multilayered feed-forward neural network for a classification problem with three classes where the model parameters will be generated randomly using your student ID. The architecture of this network is $3 (Input)\rightarrow 5(ELU) \rightarrow 3(Output)$ as shown in the following figure. Note that the ELU has the same formula as the one in Q1.1.

We feed a batch $X$ with the labels $Y$ as shown in the figure. Answer the following questions.

image.png

You need to show both formulas, numerical results, and your PyTorch code for your computation for earning full marks.

In [ ]:
import torch
student_id = 1234           #insert your student id here for example 1234
torch.manual_seed(student_id)
Out[ ]:
<torch._C.Generator at 0x2b2d9089650>
In [ ]:
#Code to generate random matrices and biases for W1, b1, W2, b2

Forward propagation

(a) What is the value of $\bar{h}^{1}(x)$ (the pre-activation values of $h^1$)?

[0.5 point]
In [ ]:
#Show your code

(b) What is the value of $h^{1}(x)$?

[0.5 point]
In [ ]:
#Show your code

(c) What is the predicted value $\widehat{y}$?

[0.5 point]
In [ ]:
#Show your code

(d) Suppose that we use the cross-entropy (CE) loss. What is the value of the CE loss $l$ incurred by the mini-batch?

[0.5 point]

In [ ]:
#Show your code

Backward propagation

(e) What are the derivatives $\frac{\partial l}{\partial h^{2}},\frac{\partial l}{\partial W^{2}}$, and $\frac{\partial l}{\partial b^{2}}$?

[3 points]
In [ ]:
#Show your code

(f) What are the derivatives $\frac{\partial l}{\partial h^{1}}, \frac{\partial l}{\partial \bar{h}^{1}},\frac{\partial l}{\partial W^{1}}$, and $\frac{\partial l}{\partial b^{1}}$?

[3 points]
In [ ]:
#Show your code

SGD update

(g) Assume that we use SGD with learning rate $\eta=0.01$ to update the model parameters. What are the values of $W^2, b^2$ and $W^1, b^1$ after updating?

[2 points]
In [ ]:
#Show your code

Option 2¶

[Total marks for this option: 20 points]
In [ ]:
import torch
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

In Option 2, you need to implement a feed-forward NN manually using PyTorch and auto-differentiation of PyTorch. We then manually train the model on the MNIST dataset.

We first download the MNIST dataset and preprocess it.

In [ ]:
transform = transforms.Compose([
    transforms.ToTensor(),  # Convert the image to a tensor with shape [C, H, W]
    transforms.Normalize((0.5,), (0.5,)),  # Normalize to [-1, 1]

    transforms.Lambda(lambda x: x.view(28*28)) # Flatten the tensor to shape [-1,HW]
])

# Load the MNIST dataset
train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)

train_data, train_labels = train_dataset.data, train_dataset.targets
test_data, test_labels = test_dataset.data, test_dataset.targets
print(train_data.shape, train_labels.shape)
print(test_data.shape, test_labels.shape)
torch.Size([60000, 28, 28]) torch.Size([60000])
torch.Size([10000, 28, 28]) torch.Size([10000])

Each data point has dimension [28,28]. We need to flatten it to a vector to input to our FFN.

In [ ]:
train_dataset.data = train_data.data.view(-1, 28*28)
test_dataset.data = test_data.data.view(-1, 28*28)

train_data, train_labels = train_dataset.data, train_dataset.targets
test_data, test_labels = test_dataset.data, test_dataset.targets
print(train_data.shape, train_labels.shape)
print(test_data.shape, test_labels.shape)
torch.Size([60000, 784]) torch.Size([60000])
torch.Size([10000, 784]) torch.Size([10000])
In [ ]:
train_loader = DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(dataset=test_dataset, batch_size=64, shuffle=False)

Develop the feed-forward neural networks

(a) You need to develop the class MyLinear with the following skeleton

[3 points]
In [ ]:
class MyLinear(torch.nn.Module):
  def __init__(self, input_size, output_size):
    """
    input_size: the size of the input
    output_size: the size of the output
    """
    super().__init__()
    #Your code here
    self.weights = torch.nn.Parameter(torch.randn(input_size, output_size))
    self.bias = torch.nn.Parameter(torch.randn(output_size))

  #forward propagation
  def forward(self, x): #x is a mini-batch
    #Your code here
    return torch.matmul(x, self.weights) + self.bias

(b) You need to develop the class MyFFN with the following skeleton

[7 points]
In [ ]:
class MyFFN(torch.nn.Module):
  def __init__(self, input_size, num_classes, hidden_sizes, act = torch.nn.ReLU()):
    """
    input_size: the size of the input
    num_classes: the number of classes
    act is the activation function
    hidden_sizes is the list of hidden sizes
    for example input_size = 3, hidden_sizes = [5, 7], num_classes = 4, and act = torch.nn.ReLU()
    means that we are building up a FFN with the confirguration
    (3 (Input) -> 5 (ReLU) -> 7 (ReLU) -> 4 (Output))
    """
    super(MyFFN, self).__init__()
    self.input_size = input_size
    self.num_classes = num_classes
    self.act = act()
    self.hidden_sizes = hidden_sizes
    self.num_layers = len(hidden_sizes) + 1

    self.create_FFN()

  def create_FFN(self):
    """
    This function creates the feed-forward neural network
    We stack many MyLinear layers
    """
    hidden_sizes = [self.input_size] + self.hidden_sizes + [self.num_classes]
    self.layers = torch.nn.ModuleList()
    for i in range(len(hidden_sizes) - 1):
        self.layers.append(MyLinear(hidden_sizes[i], hidden_sizes[i+1]))


  def forward(self,x):
    """
    This implements the forward propagation of the batch x
    This needs to return the prediction probabilities of x
    """
    for i in range(len(self.layers) - 1):
        x = self.act(self.layers[i](x))

    return self.layers[-1](x)

  def compute_loss(self, x, y):
    """
    This function computes the cross-entropy loss
    """
    #Your code here
    logits = self.forward(x)
    return torch.nn.functional.cross_entropy(logits, y)

  def update_SGD(self, x, y, learning_rate = 0.01):
    """
    This function updates the model parameters using SGD using the batch (x,y)
    """
    #Your code here
    loss = self.compute_loss(x, y)
    loss.backward()
    with torch.no_grad():
      for layer in self.layers:
        layer.weights -= learning_rate * layer.weights.grad
        layer.bias -= learning_rate * layer.bias.grad

        layer.weights.grad.zero_()
        layer.bias.grad.zero_()

  def update_SGDwithMomentum(self, x, y, learning_rate = 0.01, momentum = 0.9):
    """
    This function updates the model parameters using SGD with momentum using the batch (x,y)
    This code is based on the documentation provided by PyTorch 2.4.

    https://pytorch.org/docs/stable/generated/torch.optim.SGD.html
    """
    #Your code here

    weight_decay = 0.001
    dampening = 0

    if not hasattr(self, 'momentum_buffers'):
      self.momentum_buffers = []
      for layer in self.layers:
          self.momentum_buffers.append({'weights': torch.zeros_like(layer.weights), 'bias': torch.zeros_like(layer.bias)})

    loss = self.compute_loss(x, y)
    loss.backward()

    with torch.no_grad():
      for i, layer in enumerate(self.layers):
        grad_w = layer.weights.grad
        grad_b = layer.bias.grad

        grad_w += weight_decay * layer.weights

        self.momentum_buffers[i]['weights'] = momentum * self.momentum_buffers[i]['weights'] + (1 - dampening) * grad_w
        self.momentum_buffers[i]['bias'] = momentum * self.momentum_buffers[i]['bias'] + (1 - dampening) * grad_b

        layer.weights -= learning_rate * grad_w
        layer.bias -= learning_rate * grad_b

        layer.weights.grad.zero_()
        layer.bias.grad.zero_()

  def update_AdaGrad(self, x, y, learning_rate = 0.01):
    """
    This function updates the model parameters using AdaGrad using the batch (x,y)
    """
    #Your code here
    if not hasattr(self, 'grad_squared'):
      self.grad_squared = []
      for layer in self.layers:
        self.grad_squared.append({'weights': torch.zeros_like(layer.weights), 'bias': torch.zeros_like(layer.bias)})

    loss = self.compute_loss(x, y)
    loss.backward()
    with torch.no_grad():
      for i, layer in enumerate(self.layers):
        self.grad_squared[i]['weights'] += layer.weights.grad ** 2
        self.grad_squared[i]['bias'] += layer.bias.grad ** 2
        layer.weights -= learning_rate * layer.weights.grad / (torch.sqrt(self.grad_squared[i]['weights']) + 1e-7)
        layer.bias -= learning_rate * layer.bias.grad / (torch.sqrt(self.grad_squared[i]['bias']) + 1e-7)
        layer.weights.grad.zero_()
        layer.bias.grad.zero_()
In [ ]:
myFFN = MyFFN(input_size = 28*28, num_classes = 26, hidden_sizes = [100, 100], act = torch.nn.ReLU)
myFFN.create_FFN()
print(myFFN)
MyFFN(
  (act): ReLU()
  (layers): ModuleList(
    (0-2): 3 x MyLinear()
  )
)

(c) Write the code to evaluate the accuracy of the current myFFN model on a data loader (e.g., train_loader or test_loader).

[2.5 points]
In [ ]:
def compute_acc(model, data_loader):
  """
  This function computes the accuracy of the model on a data loader
  """
  #Your code here
  correct = 0
  total = 0
  for x, y in data_loader:
    outputs = model(x)
    _, predicted = torch.max(outputs, 1)
    total += y.size(0)
    correct += (predicted == y).sum().item()
  return correct / total

(c) Write the code to evaluate the loss of the current myFFN model on a data loader (e.g., train_loader or test_loader).

[2.5 points]
In [ ]:
def compute_loss(model, data_loader):
  """
  This function computes the loss of the model on a data loader
  """
  #Your code here
  total_loss = 0
  total_samples = 0
  for x, y in data_loader:
    loss = model.compute_loss(x, y)
    total_loss += loss.item() * y.size(0)
    total_samples += y.size(0)
  return total_loss / total_samples

Train on the MNIST data with 50 epochs using updateSGD.

In [ ]:
num_epochs = 50
for epoch in range(num_epochs):
    for i, (x, y) in enumerate(train_loader):
      myFFN.update_SGD(x, y, learning_rate = 0.01)
    train_acc = compute_acc(myFFN, train_loader)
    train_loss = compute_loss(myFFN, train_loader)
    test_acc = compute_acc(myFFN, test_loader)
    test_loss = compute_loss(myFFN, test_loader)
    print(f"Epoch {epoch+1}/{num_epochs}, Train Loss: {train_loss:.4f}, Train Acc: {train_acc*100:.2f}%, Test Loss: {test_loss:.4f}, Test Acc: {test_acc*100:.2f}%")
Epoch 1/50, Train Loss: 1.7378, Train Acc: 35.04%, Test Loss: 1.8221, Test Acc: 36.16%
Epoch 2/50, Train Loss: 1.7048, Train Acc: 36.44%, Test Loss: 1.7784, Test Acc: 37.11%
Epoch 3/50, Train Loss: 1.6679, Train Acc: 38.57%, Test Loss: 1.7543, Test Acc: 39.30%
Epoch 4/50, Train Loss: 1.6432, Train Acc: 39.33%, Test Loss: 1.7257, Test Acc: 39.87%
Epoch 5/50, Train Loss: 1.6110, Train Acc: 40.63%, Test Loss: 1.6959, Test Acc: 40.92%
Epoch 6/50, Train Loss: 1.6347, Train Acc: 39.13%, Test Loss: 1.7212, Test Acc: 39.64%
Epoch 7/50, Train Loss: 1.7118, Train Acc: 38.81%, Test Loss: 1.7731, Test Acc: 39.60%
Epoch 8/50, Train Loss: 1.5788, Train Acc: 44.07%, Test Loss: 1.6836, Test Acc: 44.10%
Epoch 9/50, Train Loss: 1.5114, Train Acc: 43.94%, Test Loss: 1.6051, Test Acc: 44.28%
Epoch 10/50, Train Loss: 1.4873, Train Acc: 45.81%, Test Loss: 1.5784, Test Acc: 45.91%
Epoch 11/50, Train Loss: 1.4818, Train Acc: 45.36%, Test Loss: 1.5677, Test Acc: 45.66%
Epoch 12/50, Train Loss: 1.4473, Train Acc: 47.51%, Test Loss: 1.5624, Test Acc: 47.34%
Epoch 13/50, Train Loss: 1.4209, Train Acc: 47.04%, Test Loss: 1.5142, Test Acc: 47.24%
Epoch 14/50, Train Loss: 1.4117, Train Acc: 47.48%, Test Loss: 1.5014, Test Acc: 47.37%
Epoch 15/50, Train Loss: 1.4411, Train Acc: 48.07%, Test Loss: 1.5498, Test Acc: 47.79%
Epoch 16/50, Train Loss: 1.3739, Train Acc: 49.43%, Test Loss: 1.4682, Test Acc: 49.29%
Epoch 17/50, Train Loss: 1.3620, Train Acc: 50.10%, Test Loss: 1.4655, Test Acc: 49.80%
Epoch 18/50, Train Loss: 1.3306, Train Acc: 51.45%, Test Loss: 1.4365, Test Acc: 51.32%
Epoch 19/50, Train Loss: 1.7139, Train Acc: 36.83%, Test Loss: 1.8274, Test Acc: 36.70%
Epoch 20/50, Train Loss: 1.2977, Train Acc: 52.65%, Test Loss: 1.3849, Test Acc: 52.46%
Epoch 21/50, Train Loss: 1.3027, Train Acc: 51.18%, Test Loss: 1.3940, Test Acc: 50.86%
Epoch 22/50, Train Loss: 1.3158, Train Acc: 52.64%, Test Loss: 1.4248, Test Acc: 52.65%
Epoch 23/50, Train Loss: 1.2644, Train Acc: 55.04%, Test Loss: 1.3786, Test Acc: 54.50%
Epoch 24/50, Train Loss: 1.2341, Train Acc: 55.35%, Test Loss: 1.3543, Test Acc: 55.08%
Epoch 25/50, Train Loss: 1.2162, Train Acc: 55.93%, Test Loss: 1.3303, Test Acc: 55.48%
Epoch 26/50, Train Loss: 1.2210, Train Acc: 55.38%, Test Loss: 1.3150, Test Acc: 55.10%
Epoch 27/50, Train Loss: 1.1890, Train Acc: 54.95%, Test Loss: 1.3027, Test Acc: 54.62%
Epoch 28/50, Train Loss: 1.2143, Train Acc: 56.71%, Test Loss: 1.3209, Test Acc: 56.43%
Epoch 29/50, Train Loss: 1.2441, Train Acc: 57.06%, Test Loss: 1.3455, Test Acc: 56.60%
Epoch 30/50, Train Loss: 1.1604, Train Acc: 57.50%, Test Loss: 1.2548, Test Acc: 57.05%
Epoch 31/50, Train Loss: 1.1705, Train Acc: 55.73%, Test Loss: 1.2688, Test Acc: 55.76%
Epoch 32/50, Train Loss: 1.5186, Train Acc: 48.08%, Test Loss: 1.6486, Test Acc: 46.89%
Epoch 33/50, Train Loss: 1.1215, Train Acc: 58.04%, Test Loss: 1.2350, Test Acc: 57.90%
Epoch 34/50, Train Loss: 1.1359, Train Acc: 59.00%, Test Loss: 1.2416, Test Acc: 58.62%
Epoch 35/50, Train Loss: 1.1310, Train Acc: 59.75%, Test Loss: 1.2314, Test Acc: 59.33%
Epoch 36/50, Train Loss: 1.1140, Train Acc: 60.32%, Test Loss: 1.2452, Test Acc: 59.83%
Epoch 37/50, Train Loss: 1.0867, Train Acc: 59.01%, Test Loss: 1.1893, Test Acc: 58.82%
Epoch 38/50, Train Loss: 1.0804, Train Acc: 61.61%, Test Loss: 1.1858, Test Acc: 61.51%
Epoch 39/50, Train Loss: 1.2273, Train Acc: 56.35%, Test Loss: 1.3213, Test Acc: 55.89%
Epoch 40/50, Train Loss: 1.0706, Train Acc: 61.70%, Test Loss: 1.1755, Test Acc: 61.41%
Epoch 41/50, Train Loss: 1.1073, Train Acc: 62.21%, Test Loss: 1.2060, Test Acc: 62.37%
Epoch 42/50, Train Loss: 1.0418, Train Acc: 62.58%, Test Loss: 1.1625, Test Acc: 62.02%
Epoch 43/50, Train Loss: 1.0626, Train Acc: 63.02%, Test Loss: 1.1902, Test Acc: 62.50%
Epoch 44/50, Train Loss: 1.0089, Train Acc: 63.55%, Test Loss: 1.1196, Test Acc: 63.24%
Epoch 45/50, Train Loss: 1.0145, Train Acc: 62.20%, Test Loss: 1.1172, Test Acc: 62.01%
Epoch 46/50, Train Loss: 1.0026, Train Acc: 63.88%, Test Loss: 1.0946, Test Acc: 63.55%
Epoch 47/50, Train Loss: 1.0255, Train Acc: 62.58%, Test Loss: 1.1398, Test Acc: 62.57%
Epoch 48/50, Train Loss: 1.0011, Train Acc: 64.74%, Test Loss: 1.1004, Test Acc: 64.60%
Epoch 49/50, Train Loss: 1.0809, Train Acc: 62.11%, Test Loss: 1.1935, Test Acc: 61.90%
Epoch 50/50, Train Loss: 0.9700, Train Acc: 65.44%, Test Loss: 1.0847, Test Acc: 65.40%

(d) Implement the function updateSGDMomentum in the class and train the model with this optimizer in 50 epochs. You can update the corresponding function in the MyFNN class.

[2.5 points]
In [ ]:
#Your code here
num_epochs = 50
for epoch in range(num_epochs):
    for i, (x, y) in enumerate(train_loader):
      myFFN.update_SGDwithMomentum(x, y, learning_rate = 0.01)
    train_acc = compute_acc(myFFN, train_loader)
    train_loss = compute_loss(myFFN, train_loader)
    test_acc = compute_acc(myFFN, test_loader)
    test_loss = compute_loss(myFFN, test_loader)
    print(f"Epoch {epoch+1}/{num_epochs}, Train Loss: {train_loss:.4f}, Train Acc: {train_acc*100:.2f}%, Test Loss: {test_loss:.4f}, Test Acc: {test_acc*100:.2f}%")
Epoch 1/50, Train Loss: 1.1503, Train Acc: 59.35%, Test Loss: 1.2843, Test Acc: 59.44%
Epoch 2/50, Train Loss: 1.1680, Train Acc: 57.11%, Test Loss: 1.3098, Test Acc: 56.78%
Epoch 3/50, Train Loss: 1.1466, Train Acc: 57.79%, Test Loss: 1.2827, Test Acc: 57.91%
Epoch 4/50, Train Loss: 1.1449, Train Acc: 59.45%, Test Loss: 1.2539, Test Acc: 59.59%
Epoch 5/50, Train Loss: 1.1098, Train Acc: 59.83%, Test Loss: 1.2298, Test Acc: 59.44%
Epoch 6/50, Train Loss: 1.2376, Train Acc: 48.77%, Test Loss: 1.3648, Test Acc: 48.93%
Epoch 7/50, Train Loss: 1.1007, Train Acc: 60.69%, Test Loss: 1.2325, Test Acc: 60.56%
Epoch 8/50, Train Loss: 1.1093, Train Acc: 60.73%, Test Loss: 1.2406, Test Acc: 60.49%
Epoch 9/50, Train Loss: 1.3944, Train Acc: 53.43%, Test Loss: 1.5353, Test Acc: 52.86%
Epoch 10/50, Train Loss: 1.0759, Train Acc: 61.49%, Test Loss: 1.1990, Test Acc: 61.11%
Epoch 11/50, Train Loss: 1.0663, Train Acc: 60.97%, Test Loss: 1.1787, Test Acc: 60.67%
Epoch 12/50, Train Loss: 1.0810, Train Acc: 59.66%, Test Loss: 1.2065, Test Acc: 59.54%
Epoch 13/50, Train Loss: 1.1290, Train Acc: 59.32%, Test Loss: 1.2288, Test Acc: 59.14%
Epoch 14/50, Train Loss: 1.0288, Train Acc: 62.13%, Test Loss: 1.1295, Test Acc: 61.87%
Epoch 15/50, Train Loss: 1.0715, Train Acc: 59.82%, Test Loss: 1.1650, Test Acc: 59.81%
Epoch 16/50, Train Loss: 0.9782, Train Acc: 66.56%, Test Loss: 1.0672, Test Acc: 66.34%
Epoch 17/50, Train Loss: 0.9124, Train Acc: 68.35%, Test Loss: 0.9967, Test Acc: 68.52%
Epoch 18/50, Train Loss: 0.8927, Train Acc: 70.57%, Test Loss: 0.9711, Test Acc: 70.48%
Epoch 19/50, Train Loss: 0.9591, Train Acc: 66.68%, Test Loss: 1.0245, Test Acc: 66.99%
Epoch 20/50, Train Loss: 0.8683, Train Acc: 69.78%, Test Loss: 0.9415, Test Acc: 70.33%
Epoch 21/50, Train Loss: 0.8737, Train Acc: 70.91%, Test Loss: 0.9520, Test Acc: 70.93%
Epoch 22/50, Train Loss: 0.8224, Train Acc: 71.96%, Test Loss: 0.9055, Test Acc: 71.90%
Epoch 23/50, Train Loss: 0.8383, Train Acc: 71.87%, Test Loss: 0.9061, Test Acc: 71.93%
Epoch 24/50, Train Loss: 0.8615, Train Acc: 71.46%, Test Loss: 0.9386, Test Acc: 71.38%
Epoch 25/50, Train Loss: 0.8167, Train Acc: 72.98%, Test Loss: 0.8944, Test Acc: 73.03%
Epoch 26/50, Train Loss: 0.7912, Train Acc: 73.87%, Test Loss: 0.8587, Test Acc: 73.77%
Epoch 27/50, Train Loss: 0.7703, Train Acc: 74.33%, Test Loss: 0.8364, Test Acc: 74.58%
Epoch 28/50, Train Loss: 0.7617, Train Acc: 75.31%, Test Loss: 0.8119, Test Acc: 75.40%
Epoch 29/50, Train Loss: 0.7569, Train Acc: 75.72%, Test Loss: 0.8233, Test Acc: 75.57%
Epoch 30/50, Train Loss: 0.7660, Train Acc: 74.69%, Test Loss: 0.8230, Test Acc: 74.51%
Epoch 31/50, Train Loss: 0.7627, Train Acc: 74.94%, Test Loss: 0.8280, Test Acc: 74.95%
Epoch 32/50, Train Loss: 0.7313, Train Acc: 76.63%, Test Loss: 0.7927, Test Acc: 76.49%
Epoch 33/50, Train Loss: 0.8026, Train Acc: 73.12%, Test Loss: 0.8580, Test Acc: 73.51%
Epoch 34/50, Train Loss: 0.7582, Train Acc: 74.59%, Test Loss: 0.8351, Test Acc: 74.33%
Epoch 35/50, Train Loss: 0.7429, Train Acc: 75.39%, Test Loss: 0.7982, Test Acc: 75.27%
Epoch 36/50, Train Loss: 0.6787, Train Acc: 78.09%, Test Loss: 0.7332, Test Acc: 77.92%
Epoch 37/50, Train Loss: 0.6943, Train Acc: 77.14%, Test Loss: 0.7537, Test Acc: 76.91%
Epoch 38/50, Train Loss: 0.6584, Train Acc: 78.61%, Test Loss: 0.7171, Test Acc: 78.40%
Epoch 39/50, Train Loss: 0.6726, Train Acc: 77.38%, Test Loss: 0.7276, Test Acc: 76.74%
Epoch 40/50, Train Loss: 0.6248, Train Acc: 79.55%, Test Loss: 0.6743, Test Acc: 78.84%
Epoch 41/50, Train Loss: 0.6194, Train Acc: 78.95%, Test Loss: 0.6620, Test Acc: 78.69%
Epoch 42/50, Train Loss: 0.6097, Train Acc: 80.00%, Test Loss: 0.6592, Test Acc: 79.53%
Epoch 43/50, Train Loss: 0.6070, Train Acc: 80.22%, Test Loss: 0.6602, Test Acc: 79.57%
Epoch 44/50, Train Loss: 0.6292, Train Acc: 79.48%, Test Loss: 0.6780, Test Acc: 78.89%
Epoch 45/50, Train Loss: 0.5883, Train Acc: 80.30%, Test Loss: 0.6382, Test Acc: 79.75%
Epoch 46/50, Train Loss: 0.5941, Train Acc: 80.92%, Test Loss: 0.6447, Test Acc: 80.54%
Epoch 47/50, Train Loss: 0.7537, Train Acc: 75.67%, Test Loss: 0.8025, Test Acc: 75.58%
Epoch 48/50, Train Loss: 0.5727, Train Acc: 81.48%, Test Loss: 0.6324, Test Acc: 81.44%
Epoch 49/50, Train Loss: 0.5497, Train Acc: 82.56%, Test Loss: 0.5970, Test Acc: 82.18%
Epoch 50/50, Train Loss: 0.5476, Train Acc: 82.62%, Test Loss: 0.5947, Test Acc: 82.57%

(e) Implement the function updateAdagrad in the class and train the model with this optimizer in 50 epochs. You can update the corresponding function in the MyFNN class.

[2.5 points]
In [ ]:
#Your code here
num_epochs = 50
for epoch in range(num_epochs):
    for i, (x, y) in enumerate(train_loader):
      myFFN.update_AdaGrad(x, y, learning_rate = 0.01)
    train_acc = compute_acc(myFFN, train_loader)
    train_loss = compute_loss(myFFN, train_loader)
    test_acc = compute_acc(myFFN, test_loader)
    test_loss = compute_loss(myFFN, test_loader)
    print(f"Epoch {epoch+1}/{num_epochs}, Train Loss: {train_loss:.4f}, Train Acc: {train_acc*100:.2f}%, Test Loss: {test_loss:.4f}, Test Acc: {test_acc*100:.2f}%")
Epoch 1/50, Train Loss: 54.7369, Train Acc: 76.63%, Test Loss: 50.5629, Test Acc: 77.67%
Epoch 2/50, Train Loss: 40.8939, Train Acc: 80.16%, Test Loss: 37.9174, Test Acc: 81.06%
Epoch 3/50, Train Loss: 34.4007, Train Acc: 82.26%, Test Loss: 32.8206, Test Acc: 82.82%
Epoch 4/50, Train Loss: 30.6552, Train Acc: 83.43%, Test Loss: 29.2081, Test Acc: 83.95%
Epoch 5/50, Train Loss: 27.6838, Train Acc: 84.16%, Test Loss: 26.6370, Test Acc: 84.66%
Epoch 6/50, Train Loss: 25.6277, Train Acc: 84.84%, Test Loss: 24.6740, Test Acc: 85.13%
Epoch 7/50, Train Loss: 23.9816, Train Acc: 85.47%, Test Loss: 23.3811, Test Acc: 85.55%
Epoch 8/50, Train Loss: 22.5088, Train Acc: 85.84%, Test Loss: 22.2491, Test Acc: 86.09%
Epoch 9/50, Train Loss: 21.4772, Train Acc: 86.03%, Test Loss: 21.3394, Test Acc: 85.91%
Epoch 10/50, Train Loss: 20.4561, Train Acc: 86.61%, Test Loss: 20.2450, Test Acc: 86.62%
Epoch 11/50, Train Loss: 19.4261, Train Acc: 86.98%, Test Loss: 19.5653, Test Acc: 86.76%
Epoch 12/50, Train Loss: 18.6551, Train Acc: 87.19%, Test Loss: 19.0263, Test Acc: 86.77%
Epoch 13/50, Train Loss: 17.9624, Train Acc: 87.38%, Test Loss: 18.4756, Test Acc: 86.91%
Epoch 14/50, Train Loss: 17.3734, Train Acc: 87.67%, Test Loss: 17.9008, Test Acc: 87.13%
Epoch 15/50, Train Loss: 16.7291, Train Acc: 87.81%, Test Loss: 17.3190, Test Acc: 87.35%
Epoch 16/50, Train Loss: 16.2020, Train Acc: 87.98%, Test Loss: 16.8973, Test Acc: 87.54%
Epoch 17/50, Train Loss: 15.7190, Train Acc: 88.22%, Test Loss: 16.4542, Test Acc: 87.70%
Epoch 18/50, Train Loss: 15.2374, Train Acc: 88.41%, Test Loss: 16.1206, Test Acc: 87.84%
Epoch 19/50, Train Loss: 14.8880, Train Acc: 88.44%, Test Loss: 15.7790, Test Acc: 88.04%
Epoch 20/50, Train Loss: 14.4875, Train Acc: 88.59%, Test Loss: 15.4817, Test Acc: 88.09%
Epoch 21/50, Train Loss: 14.0844, Train Acc: 88.81%, Test Loss: 15.1538, Test Acc: 88.36%
Epoch 22/50, Train Loss: 13.7907, Train Acc: 88.87%, Test Loss: 14.9675, Test Acc: 88.30%
Epoch 23/50, Train Loss: 13.4494, Train Acc: 89.05%, Test Loss: 14.6191, Test Acc: 88.38%
Epoch 24/50, Train Loss: 13.1749, Train Acc: 89.16%, Test Loss: 14.3627, Test Acc: 88.31%
Epoch 25/50, Train Loss: 12.8813, Train Acc: 89.35%, Test Loss: 14.2309, Test Acc: 88.46%
Epoch 26/50, Train Loss: 12.6115, Train Acc: 89.39%, Test Loss: 13.9394, Test Acc: 88.78%
Epoch 27/50, Train Loss: 12.3049, Train Acc: 89.56%, Test Loss: 13.7403, Test Acc: 88.75%
Epoch 28/50, Train Loss: 12.0982, Train Acc: 89.63%, Test Loss: 13.5893, Test Acc: 88.98%
Epoch 29/50, Train Loss: 11.9149, Train Acc: 89.67%, Test Loss: 13.4442, Test Acc: 88.90%
Epoch 30/50, Train Loss: 11.6483, Train Acc: 89.90%, Test Loss: 13.1822, Test Acc: 89.03%
Epoch 31/50, Train Loss: 11.4289, Train Acc: 89.97%, Test Loss: 13.0485, Test Acc: 88.99%
Epoch 32/50, Train Loss: 11.2471, Train Acc: 90.02%, Test Loss: 12.8518, Test Acc: 89.00%
Epoch 33/50, Train Loss: 11.0314, Train Acc: 90.06%, Test Loss: 12.7065, Test Acc: 89.16%
Epoch 34/50, Train Loss: 10.8463, Train Acc: 90.19%, Test Loss: 12.4977, Test Acc: 89.15%
Epoch 35/50, Train Loss: 10.6792, Train Acc: 90.26%, Test Loss: 12.4383, Test Acc: 89.08%
Epoch 36/50, Train Loss: 10.4951, Train Acc: 90.32%, Test Loss: 12.3173, Test Acc: 89.24%
Epoch 37/50, Train Loss: 10.3312, Train Acc: 90.47%, Test Loss: 12.1289, Test Acc: 89.38%
Epoch 38/50, Train Loss: 10.2009, Train Acc: 90.44%, Test Loss: 12.1107, Test Acc: 89.34%
Epoch 39/50, Train Loss: 10.0723, Train Acc: 90.50%, Test Loss: 11.9528, Test Acc: 89.44%
Epoch 40/50, Train Loss: 9.8899, Train Acc: 90.56%, Test Loss: 11.8112, Test Acc: 89.29%
Epoch 41/50, Train Loss: 9.7214, Train Acc: 90.75%, Test Loss: 11.6641, Test Acc: 89.47%
Epoch 42/50, Train Loss: 9.6081, Train Acc: 90.68%, Test Loss: 11.5468, Test Acc: 89.29%
Epoch 43/50, Train Loss: 9.4814, Train Acc: 90.73%, Test Loss: 11.4469, Test Acc: 89.47%
Epoch 44/50, Train Loss: 9.3451, Train Acc: 90.88%, Test Loss: 11.3694, Test Acc: 89.48%
Epoch 45/50, Train Loss: 9.2122, Train Acc: 91.02%, Test Loss: 11.2464, Test Acc: 89.49%
Epoch 46/50, Train Loss: 9.0976, Train Acc: 91.05%, Test Loss: 11.1682, Test Acc: 89.59%
Epoch 47/50, Train Loss: 8.9702, Train Acc: 91.06%, Test Loss: 11.0352, Test Acc: 89.65%
Epoch 48/50, Train Loss: 8.8576, Train Acc: 91.11%, Test Loss: 10.9785, Test Acc: 89.67%
Epoch 49/50, Train Loss: 8.7759, Train Acc: 91.14%, Test Loss: 10.8407, Test Acc: 89.70%
Epoch 50/50, Train Loss: 8.6631, Train Acc: 91.18%, Test Loss: 10.8240, Test Acc: 89.72%

Part 2: Deep Neural Networks (DNN) ¶

[Total marks for this part: 25 points]

The second part of this assignment is to demonstrate your basis knowledge in deep learning that you have acquired from the lectures and tutorials materials. Most of the contents in this assignment are drawn from the tutorials covered from weeks 1 to 2. Going through these materials before attempting this assignment is highly recommended.

In the second part of this assignment, you are going to work with the FashionMNIST dataset for image recognition task. It has the exact same format as MNIST (70,000 grayscale images of 28 × 28 pixels each with 10 classes), but the images represent fashion items rather than handwritten digits, so each class is more diverse, and the problem is significantly more challenging than MNIST.

In [ ]:
import torch
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

torch.manual_seed(1234)
Out[ ]:
<torch._C.Generator at 0x2b2d9089650>

Load the Fashion MNIST using torchvision

In [ ]:
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])

train_dataset_orgin = datasets.FashionMNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = datasets.FashionMNIST(root='./data', train=False, download=True, transform=transform)

print(train_dataset_orgin.data.shape, train_dataset_orgin.targets.shape)
print(test_dataset.data.shape, test_dataset.targets.shape)

train_dataset_orgin.data = train_dataset_orgin.data.view(-1, 28*28)
test_dataset.data = test_dataset.data.view(-1, 28*28)

print(train_dataset_orgin.data.shape, train_dataset_orgin.targets.shape)
print(test_dataset.data.shape, test_dataset.targets.shape)

N = len(train_dataset_orgin)
print(f"Number of training samples: {N}")
N_train = int(0.9*N)
N_val = N - N_train
print(f"Number of training samples: {N_train}")
print(f"Number of validation samples: {N_val}")

train_dataset, val_dataset = torch.utils.data.random_split(train_dataset_orgin, [N_train, N_val])


train_loader = DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)
val_loader = DataLoader(dataset=val_dataset, batch_size=64, shuffle=False)
test_loader = DataLoader(dataset=test_dataset, batch_size=1000, shuffle=False)
torch.Size([60000, 28, 28]) torch.Size([60000])
torch.Size([10000, 28, 28]) torch.Size([10000])
torch.Size([60000, 784]) torch.Size([60000])
torch.Size([10000, 784]) torch.Size([10000])
Number of training samples: 60000
Number of training samples: 54000
Number of validation samples: 6000

Question 2.1: Write the code to visualize a mini-batch in train_loader including its images and labels.¶

[5 points]
In [ ]:
#Your code here
import matplotlib.pyplot as plt
import numpy as np

def imshow(img, labels):
    img = img.numpy()
    img = img.reshape(-1, 28, 28)
    fig, axes = plt.subplots(8, 8, figsize=(8, 8))
    for i, ax in enumerate(axes.flat):
        ax.imshow(img[i], cmap='gray')
        ax.set_title(f'Label: {labels[i].item()}')
        ax.axis('off')
    plt.tight_layout()
    plt.show()

dataiter = iter(train_loader)
images, labels = next(dataiter)

imshow(images, labels)
No description has been provided for this image

Question 2.2: Write the code for the feed-forward neural net using PyTorch¶

[5 points]

We now develop a feed-forward neural network with the architecture $784 \rightarrow 40(ReLU) \rightarrow 30(ReLU) \rightarrow 10(softmax)$. You can choose your own way to implement your network and an optimizer of interest. You should train model in $50$ epochs and evaluate the trained model on the test set.

In [ ]:
#Your code here
import torch
import torch.nn as nn
import torch.optim as optim

class FeedforwardModel(nn.Module):
    def __init__(self):
        super(FeedforwardModel, self).__init__()
        self.fc1 = nn.Linear(784, 40)
        self.fc2 = nn.Linear(40, 30)
        self.fc3 = nn.Linear(30, 10)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = torch.softmax(self.fc3(x), dim=1)
        return x

model = FeedforwardModel()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

def calculate_accuracy(loader, model):
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in loader:
            images = images.view(-1, 28 * 28)
            outputs = model(images)
            _, predicted = torch.max(outputs, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    return correct / total

def calculate_loss(loader, model, criterion):
    total_loss = 0.0
    with torch.no_grad():
        for images, labels in loader:
            images = images.view(-1, 28 * 28)
            outputs = model(images)
            loss = criterion(outputs, labels)
            total_loss += loss.item()
    return total_loss / len(loader)

num_epochs = 20
for epoch in range(num_epochs):
    running_loss = 0.0
    correct = 0
    total = 0

    model.train()
    for images, labels in train_loader:
        images = images.view(-1, 28 * 28)

        outputs = model(images)
        loss = criterion(outputs, labels)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        running_loss += loss.item()

        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    train_loss = running_loss / len(train_loader)
    train_acc = correct / total

    model.eval()
    test_loss = calculate_loss(test_loader, model, criterion)
    test_acc = calculate_accuracy(test_loader, model)

    # Print results for the current epoch
    print(f"Epoch {epoch+1}/{num_epochs}, Train Loss: {train_loss:.4f}, Train Acc: {train_acc*100:.2f}%, Test Loss: {test_loss:.4f}, Test Acc: {test_acc*100:.2f}%")
Epoch 1/20, Train Loss: 1.7124, Train Acc: 76.42%, Test Loss: 1.6540, Test Acc: 81.59%
Epoch 2/20, Train Loss: 1.6329, Train Acc: 83.30%, Test Loss: 1.6372, Test Acc: 82.87%
Epoch 3/20, Train Loss: 1.6195, Train Acc: 84.39%, Test Loss: 1.6264, Test Acc: 83.73%
Epoch 4/20, Train Loss: 1.6120, Train Acc: 85.06%, Test Loss: 1.6321, Test Acc: 82.93%
Epoch 5/20, Train Loss: 1.6077, Train Acc: 85.49%, Test Loss: 1.6200, Test Acc: 84.29%
Epoch 6/20, Train Loss: 1.6025, Train Acc: 86.01%, Test Loss: 1.6215, Test Acc: 84.04%
Epoch 7/20, Train Loss: 1.5987, Train Acc: 86.36%, Test Loss: 1.6156, Test Acc: 84.61%
Epoch 8/20, Train Loss: 1.5957, Train Acc: 86.65%, Test Loss: 1.6119, Test Acc: 84.74%
Epoch 9/20, Train Loss: 1.5932, Train Acc: 86.83%, Test Loss: 1.6093, Test Acc: 85.18%
Epoch 10/20, Train Loss: 1.5905, Train Acc: 87.07%, Test Loss: 1.6046, Test Acc: 85.58%
Epoch 11/20, Train Loss: 1.5873, Train Acc: 87.44%, Test Loss: 1.6054, Test Acc: 85.61%
Epoch 12/20, Train Loss: 1.5858, Train Acc: 87.59%, Test Loss: 1.6060, Test Acc: 85.45%
Epoch 13/20, Train Loss: 1.5849, Train Acc: 87.69%, Test Loss: 1.6079, Test Acc: 85.26%
Epoch 14/20, Train Loss: 1.5826, Train Acc: 87.90%, Test Loss: 1.6052, Test Acc: 85.56%
Epoch 15/20, Train Loss: 1.5809, Train Acc: 88.06%, Test Loss: 1.6026, Test Acc: 85.82%
Epoch 16/20, Train Loss: 1.5800, Train Acc: 88.14%, Test Loss: 1.6068, Test Acc: 85.34%
Epoch 17/20, Train Loss: 1.5781, Train Acc: 88.36%, Test Loss: 1.5962, Test Acc: 86.42%
Epoch 18/20, Train Loss: 1.5761, Train Acc: 88.50%, Test Loss: 1.5981, Test Acc: 86.31%
Epoch 19/20, Train Loss: 1.5756, Train Acc: 88.61%, Test Loss: 1.6028, Test Acc: 85.89%
Epoch 20/20, Train Loss: 1.5746, Train Acc: 88.70%, Test Loss: 1.5986, Test Acc: 86.29%

Question 2.3: Tuning hyper-parameters with grid search¶

[5 points]

Assume that you need to tune the number of neurons on the first and second hidden layers $n_1 \in \{20, 40\}$, $n_2 \in \{20, 40\}$ and the used activation function $act \in \{sigmoid, tanh, relu\}$. The network has the architecture pattern $784 \rightarrow n_1 (act) \rightarrow n_2(act) \rightarrow 10(softmax)$ where $n_1, n_2$, and $act$ are in their grides. Write the code to tune the hyper-parameters $n_1, n_2$, and $act$. Note that you can freely choose the optimizer and learning rate of interest for this task.

In [ ]:
#Your code here
import torch
import torch.nn as nn
import torch.optim as optim
import itertools

class GridSearchModel(nn.Module):
    def __init__(self, n1, n2, activation):
        super(GridSearchModel, self).__init__()
        self.fc1 = nn.Linear(784, n1)
        self.fc2 = nn.Linear(n1, n2)
        self.fc3 = nn.Linear(n2, 10)
        self.activation = activation

    def forward(self, x):
        x = self.activation(self.fc1(x))
        x = self.activation(self.fc2(x))
        x = torch.softmax(self.fc3(x), dim=1)
        return x

def evaluate_model(n1, n2, activation_fn):
    model = GridSearchModel(n1, n2, activation_fn)
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=0.001)

    num_epochs = 20
    for epoch in range(num_epochs):
        for images, labels in train_loader:
            images = images.view(-1, 28 * 28)
            outputs = model(images)
            loss = criterion(outputs, labels)
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in test_loader:
            images = images.view(-1, 28 * 28)
            outputs = model(images)
            _, predicted = torch.max(outputs, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    accuracy = 100 * correct / total
    return accuracy

n1_values = [20, 40]
n2_values = [20, 40]
activation_functions = {
    'relu': torch.relu,
    'sigmoid': torch.sigmoid,
    'tanh': torch.tanh
}

best_accuracy = 0
best_params = None

for n1, n2, (act_name, act_fn) in itertools.product(n1_values, n2_values, activation_functions.items()):
    accuracy = evaluate_model(n1, n2, act_fn)
    print(f'n1: {n1}, n2: {n2}, activation: {act_name}, Accuracy: {accuracy:.2f}%')

    if accuracy > best_accuracy:
        best_accuracy = accuracy
        best_params = (n1, n2, act_name)

print(f'Best Params: n1 = {best_params[0]}, n2 = {best_params[1]}, activation = {best_params[2]}')
print(f'Best Accuracy: {best_accuracy:.2f}%')
n1: 20, n2: 20, activation: relu, Accuracy: 85.17%
n1: 20, n2: 20, activation: sigmoid, Accuracy: 84.63%
n1: 20, n2: 20, activation: tanh, Accuracy: 84.79%
n1: 20, n2: 40, activation: relu, Accuracy: 85.65%
n1: 20, n2: 40, activation: sigmoid, Accuracy: 84.98%
n1: 20, n2: 40, activation: tanh, Accuracy: 84.97%
n1: 40, n2: 20, activation: relu, Accuracy: 85.76%
n1: 40, n2: 20, activation: sigmoid, Accuracy: 82.02%
n1: 40, n2: 20, activation: tanh, Accuracy: 85.25%
n1: 40, n2: 40, activation: relu, Accuracy: 86.42%
n1: 40, n2: 40, activation: sigmoid, Accuracy: 85.90%
n1: 40, n2: 40, activation: tanh, Accuracy: 86.09%
Best Params: n1 = 40, n2 = 40, activation = relu
Best Accuracy: 86.42%

Question 2.4: Implement the loss with the form: $loss(p,y)=CE(1_{y},p)+\lambda H(p)$ where $H(p)=-\sum_{i=1}^{M}p_{i}\log p_{i}$ is the entropy of $p$, $p$ is the prediction probabilities of a data point $x$ with the ground-truth label $y$, $1_y$ is an one-hot label, and $\lambda >0$ is a trade-off parameter. Set $\lambda = 0.1$ to train a model.¶

[5 points]
In [ ]:
#Your code here
def H(outputs):
  return -torch.sum(outputs * torch.log(outputs + 1e-8))

def loss_function(outputs, labels):
  cross_entropy_loss = nn.CrossEntropyLoss()(outputs, labels)
  loss = cross_entropy_loss + 0.1 * H(torch.softmax(outputs, dim=1))
  return loss

import torch
import torch.nn as nn
import torch.optim as optim

class FFModel(nn.Module):
    def __init__(self):
        super(FFModel, self).__init__()
        self.fc1 = nn.Linear(784, 40)
        self.fc2 = nn.Linear(40, 40)
        self.fc3 = nn.Linear(40, 10)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x

model = FFModel()
criterion = loss_function
optimizer = optim.Adam(model.parameters(), lr=0.01)

def calculate_accuracy(loader, model):
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in loader:
            images = images.view(-1, 28 * 28)
            outputs = model(images)
            _, predicted = torch.max(outputs, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    return correct / total

def calculate_loss(loader, model, criterion):
    total_loss = 0.0
    with torch.no_grad():
        for images, labels in loader:
            images = images.view(-1, 28 * 28)
            outputs = model(images)
            loss = criterion(outputs, labels)
            total_loss += loss.item()
    return total_loss / len(loader)

num_epochs = 50
for epoch in range(num_epochs):
    running_loss = 0.0
    correct = 0
    total = 0

    model.train()
    for images, labels in train_loader:
        images = images.view(-1, 28 * 28)

        outputs = model(images)
        loss = criterion(outputs, labels)
        print(loss)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    train_loss = calculate_loss(train_loader, model, criterion)
    train_acc = calculate_accuracy(train_loader, model)

    model.eval()
    test_loss = calculate_loss(test_loader, model, criterion)
    test_acc = calculate_accuracy(test_loader, model)

    # Print results for the current epoch
    print(f"Epoch {epoch+1}/{num_epochs}, Train Loss: {train_loss:.4f}, Train Acc: {train_acc*100:.2f}%, Test Loss: {test_loss:.4f}, Test Acc: {test_acc*100:.2f}%")
Epoch 1/50, Train Loss: 4.5516, Train Acc: 9.96%, Test Loss: 23.8562, Test Acc: 10.00%
Epoch 2/50, Train Loss: 4.3624, Train Acc: 9.95%, Test Loss: 26.0304, Test Acc: 10.00%
Epoch 3/50, Train Loss: 4.3257, Train Acc: 9.95%, Test Loss: 23.5802, Test Acc: 10.00%
Epoch 4/50, Train Loss: 4.3107, Train Acc: 9.95%, Test Loss: 22.9420, Test Acc: 10.00%
Epoch 5/50, Train Loss: 4.2978, Train Acc: 9.95%, Test Loss: 25.3002, Test Acc: 10.00%
Epoch 6/50, Train Loss: 4.2856, Train Acc: 9.95%, Test Loss: 25.5628, Test Acc: 10.00%
Epoch 7/50, Train Loss: 4.2799, Train Acc: 9.95%, Test Loss: 23.9246, Test Acc: 10.00%
Epoch 8/50, Train Loss: 4.2751, Train Acc: 9.95%, Test Loss: 24.9006, Test Acc: 10.00%
Epoch 9/50, Train Loss: 4.2708, Train Acc: 9.95%, Test Loss: 25.4032, Test Acc: 10.00%
Epoch 10/50, Train Loss: 4.2681, Train Acc: 9.96%, Test Loss: 24.1193, Test Acc: 10.00%
Epoch 11/50, Train Loss: 4.2567, Train Acc: 9.95%, Test Loss: 23.2982, Test Acc: 10.00%
Epoch 12/50, Train Loss: 4.2595, Train Acc: 9.95%, Test Loss: 24.0120, Test Acc: 10.00%
Epoch 13/50, Train Loss: 4.2618, Train Acc: 9.95%, Test Loss: 25.5567, Test Acc: 10.00%
Epoch 14/50, Train Loss: 4.2419, Train Acc: 9.95%, Test Loss: 24.6986, Test Acc: 10.00%
Epoch 15/50, Train Loss: 4.2489, Train Acc: 9.95%, Test Loss: 25.6232, Test Acc: 10.00%
Epoch 16/50, Train Loss: 4.2480, Train Acc: 9.95%, Test Loss: 24.1571, Test Acc: 10.00%
Epoch 17/50, Train Loss: 4.2491, Train Acc: 9.95%, Test Loss: 25.2983, Test Acc: 10.00%
Epoch 18/50, Train Loss: 4.2412, Train Acc: 9.95%, Test Loss: 23.9229, Test Acc: 10.00%
Epoch 19/50, Train Loss: 4.2484, Train Acc: 9.95%, Test Loss: 25.1636, Test Acc: 10.00%
Epoch 20/50, Train Loss: 4.2328, Train Acc: 9.95%, Test Loss: 25.9687, Test Acc: 10.00%
Epoch 21/50, Train Loss: 4.2342, Train Acc: 9.95%, Test Loss: 24.1425, Test Acc: 10.00%
Epoch 22/50, Train Loss: 4.2399, Train Acc: 9.95%, Test Loss: 23.9586, Test Acc: 10.00%
Epoch 23/50, Train Loss: 4.2345, Train Acc: 9.95%, Test Loss: 24.6491, Test Acc: 10.00%
Epoch 24/50, Train Loss: 4.2289, Train Acc: 9.95%, Test Loss: 23.0658, Test Acc: 10.00%
Epoch 25/50, Train Loss: 4.0609, Train Acc: 16.60%, Test Loss: 19.9357, Test Acc: 29.79%
Epoch 26/50, Train Loss: 3.2646, Train Acc: 37.72%, Test Loss: 17.9622, Test Acc: 38.41%
Epoch 27/50, Train Loss: 2.2274, Train Acc: 61.41%, Test Loss: 11.1712, Test Acc: 66.43%
Epoch 28/50, Train Loss: 2.0080, Train Acc: 67.05%, Test Loss: 12.4111, Test Acc: 65.59%
Epoch 29/50, Train Loss: 1.9578, Train Acc: 67.30%, Test Loss: 10.6782, Test Acc: 64.98%
Epoch 30/50, Train Loss: 1.9485, Train Acc: 67.27%, Test Loss: 11.7647, Test Acc: 65.72%
Epoch 31/50, Train Loss: 1.9585, Train Acc: 66.57%, Test Loss: 9.8239, Test Acc: 65.99%
Epoch 32/50, Train Loss: 1.9342, Train Acc: 67.06%, Test Loss: 9.9229, Test Acc: 65.73%
Epoch 33/50, Train Loss: 1.9492, Train Acc: 66.54%, Test Loss: 9.4921, Test Acc: 64.45%
Epoch 34/50, Train Loss: 1.9658, Train Acc: 66.92%, Test Loss: 10.9805, Test Acc: 64.48%
Epoch 35/50, Train Loss: 1.9394, Train Acc: 67.06%, Test Loss: 11.0351, Test Acc: 65.97%
Epoch 36/50, Train Loss: 1.9306, Train Acc: 67.28%, Test Loss: 11.8496, Test Acc: 65.09%
Epoch 37/50, Train Loss: 1.9212, Train Acc: 67.48%, Test Loss: 12.8103, Test Acc: 66.14%
Epoch 38/50, Train Loss: 1.8907, Train Acc: 67.66%, Test Loss: 11.3412, Test Acc: 65.68%
Epoch 39/50, Train Loss: 1.9189, Train Acc: 67.20%, Test Loss: 11.4262, Test Acc: 65.06%
Epoch 40/50, Train Loss: 1.9058, Train Acc: 67.66%, Test Loss: 10.0777, Test Acc: 66.71%
Epoch 41/50, Train Loss: 1.4354, Train Acc: 79.74%, Test Loss: 5.8100, Test Acc: 82.97%
Epoch 42/50, Train Loss: 1.2371, Train Acc: 84.14%, Test Loss: 6.2854, Test Acc: 82.43%
Epoch 43/50, Train Loss: 1.3240, Train Acc: 82.10%, Test Loss: 7.9611, Test Acc: 82.16%
Epoch 44/50, Train Loss: 1.1755, Train Acc: 84.93%, Test Loss: 6.0656, Test Acc: 82.01%
Epoch 45/50, Train Loss: 1.1332, Train Acc: 85.41%, Test Loss: 5.8475, Test Acc: 83.34%
Epoch 46/50, Train Loss: 1.1613, Train Acc: 85.08%, Test Loss: 5.0420, Test Acc: 82.82%
Epoch 47/50, Train Loss: 1.1648, Train Acc: 84.72%, Test Loss: 5.5795, Test Acc: 82.55%
Epoch 48/50, Train Loss: 1.1654, Train Acc: 84.83%, Test Loss: 5.2048, Test Acc: 82.96%
Epoch 49/50, Train Loss: 1.1580, Train Acc: 84.70%, Test Loss: 6.4616, Test Acc: 82.99%
Epoch 50/50, Train Loss: 1.1378, Train Acc: 85.39%, Test Loss: 5.6053, Test Acc: 83.73%

Question 2.5: Experimenting with sharpness-aware minimization technique¶

[5 points]

Sharpness-aware minimization (SAM) (i.e., link for main paper from Google Deepmind) is a simple yet but efficient technique to improve the generalization ability of deep learning models on unseen data examples. In your research or your work, you might potentially use this idea. Your task is to read the paper and implement Sharpness-aware minimization (SAM). Finally, you need to apply SAM to the best architecture found in Question 2.3.

In [ ]:
# The paper given gave a link for a sample implementation sharpness-aware minimization.

# Here is the git repository for the sample implementation: https://github.com/davda54/sam/blob/main/README.md

# The implementation of the SAM optimizer is inspired by the paper given and the git repository provided.

# For this implementation, we will use adaptive = False, and rho = 0.1.
In [ ]:
#Your code here

import torch
import torch.nn as nn
import torch.optim as optim


class SAM(optim.Optimizer):
    def __init__(self, params, base_optimizer, lr=0.01, rho=0.05):
        defaults = {'rho': rho}
        super(SAM, self).__init__(params, defaults)
        self.base_optimizer = base_optimizer(self.param_groups)
        self.param_groups = self.base_optimizer.param_groups

        for k, v in self.base_optimizer.defaults.items():
            self.defaults[k] = v

    @torch.no_grad()
    def first_step(self):
        grad_norm = self._grad_norm()

        for group in self.param_groups:
            for p in group["params"]:
                if p.grad is None:
                    continue

                self.state[p]["old_p"] = p.data.clone()
                e_w = p.grad / (grad_norm + 1e-12)
                p.add_(e_w)

        self.zero_grad()

    @torch.no_grad()
    def second_step(self):
        for group in self.param_groups:
            for p in group["params"]:
                if p.grad is None:
                    continue

                p.data = self.state[p]["old_p"]

        self.base_optimizer.step()
        self.zero_grad()

    def _grad_norm(self):
        norm = torch.norm(torch.stack([(p.grad).norm(p=2) for group in self.param_groups for p in group["params"] if p.grad is not None]), p=2)
        return norm


class FFModel(nn.Module):
    def __init__(self):
        super(FFModel, self).__init__()
        self.fc1 = nn.Linear(784, 40)
        self.fc2 = nn.Linear(40, 40)
        self.fc3 = nn.Linear(40, 10)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x

model = FFModel()
criterion = nn.CrossEntropyLoss()
base_optimizer = optim.SGD
optimizer = SAM(model.parameters(), base_optimizer, lr=0.01)

def calculate_accuracy(loader, model):
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in loader:
            images = images.view(-1, 28 * 28)
            outputs = model(images)
            _, predicted = torch.max(outputs, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    return correct / total

def calculate_loss(loader, model, criterion):
    total_loss = 0.0
    with torch.no_grad():
        for images, labels in loader:
            images = images.view(-1, 28 * 28)
            outputs = model(images)
            loss = criterion(outputs, labels)
            total_loss += loss.item()
    return total_loss / len(loader)

num_epochs = 50
for epoch in range(num_epochs):
    running_loss = 0.0
    correct = 0
    total = 0

    model.train()
    for images, labels in train_loader:
        images = images.view(-1, 28 * 28)

        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.first_step()

        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.second_step()

    train_loss = calculate_loss(train_loader, model, criterion)
    train_acc = calculate_accuracy(train_loader, model)

    model.eval()
    test_loss = calculate_loss(test_loader, model, criterion)
    test_acc = calculate_accuracy(test_loader, model)

    # Print results for the current epoch
    print(f"Epoch {epoch+1}/{num_epochs}, Train Loss: {train_loss:.4f}, Train Acc: {train_acc*100:.2f}%, Test Loss: {test_loss:.4f}, Test Acc: {test_acc*100:.2f}%")
Epoch 1/50, Train Loss: 2.2964, Train Acc: 10.24%, Test Loss: 2.2964, Test Acc: 10.27%
Epoch 2/50, Train Loss: 2.2970, Train Acc: 10.77%, Test Loss: 2.2970, Test Acc: 10.95%
Epoch 3/50, Train Loss: 2.2969, Train Acc: 11.92%, Test Loss: 2.2969, Test Acc: 12.15%
Epoch 4/50, Train Loss: 2.2968, Train Acc: 13.36%, Test Loss: 2.2967, Test Acc: 13.63%
Epoch 5/50, Train Loss: 2.2966, Train Acc: 14.95%, Test Loss: 2.2966, Test Acc: 15.21%
Epoch 6/50, Train Loss: 2.2966, Train Acc: 16.56%, Test Loss: 2.2966, Test Acc: 16.69%
Epoch 7/50, Train Loss: 2.2962, Train Acc: 17.38%, Test Loss: 2.2962, Test Acc: 17.60%
Epoch 8/50, Train Loss: 2.2956, Train Acc: 17.28%, Test Loss: 2.2957, Test Acc: 17.32%
Epoch 9/50, Train Loss: 2.2948, Train Acc: 17.10%, Test Loss: 2.2949, Test Acc: 16.98%
Epoch 10/50, Train Loss: 2.2939, Train Acc: 17.21%, Test Loss: 2.2940, Test Acc: 16.99%
Epoch 11/50, Train Loss: 2.2930, Train Acc: 17.41%, Test Loss: 2.2931, Test Acc: 17.11%
Epoch 12/50, Train Loss: 2.2921, Train Acc: 17.85%, Test Loss: 2.2923, Test Acc: 17.86%
Epoch 13/50, Train Loss: 2.2911, Train Acc: 18.40%, Test Loss: 2.2913, Test Acc: 18.29%
Epoch 14/50, Train Loss: 2.2895, Train Acc: 18.91%, Test Loss: 2.2897, Test Acc: 18.93%
Epoch 15/50, Train Loss: 2.2875, Train Acc: 19.32%, Test Loss: 2.2876, Test Acc: 19.46%
Epoch 16/50, Train Loss: 2.2848, Train Acc: 19.44%, Test Loss: 2.2850, Test Acc: 19.25%
Epoch 17/50, Train Loss: 2.2817, Train Acc: 19.56%, Test Loss: 2.2820, Test Acc: 19.29%
Epoch 18/50, Train Loss: 2.2786, Train Acc: 20.93%, Test Loss: 2.2789, Test Acc: 20.70%
Epoch 19/50, Train Loss: 2.2754, Train Acc: 22.88%, Test Loss: 2.2758, Test Acc: 22.64%
Epoch 20/50, Train Loss: 2.2715, Train Acc: 27.15%, Test Loss: 2.2719, Test Acc: 26.86%
Epoch 21/50, Train Loss: 2.2676, Train Acc: 28.41%, Test Loss: 2.2680, Test Acc: 28.58%
Epoch 22/50, Train Loss: 2.2628, Train Acc: 28.32%, Test Loss: 2.2634, Test Acc: 28.54%
Epoch 23/50, Train Loss: 2.2578, Train Acc: 28.08%, Test Loss: 2.2584, Test Acc: 28.43%
Epoch 24/50, Train Loss: 2.2526, Train Acc: 28.65%, Test Loss: 2.2533, Test Acc: 29.11%
Epoch 25/50, Train Loss: 2.2466, Train Acc: 28.83%, Test Loss: 2.2473, Test Acc: 29.00%
Epoch 26/50, Train Loss: 2.2400, Train Acc: 28.82%, Test Loss: 2.2407, Test Acc: 28.88%
Epoch 27/50, Train Loss: 2.2326, Train Acc: 28.78%, Test Loss: 2.2334, Test Acc: 28.72%
Epoch 28/50, Train Loss: 2.2236, Train Acc: 28.56%, Test Loss: 2.2245, Test Acc: 28.61%
Epoch 29/50, Train Loss: 2.2137, Train Acc: 28.66%, Test Loss: 2.2146, Test Acc: 28.38%
Epoch 30/50, Train Loss: 2.2032, Train Acc: 28.69%, Test Loss: 2.2043, Test Acc: 28.52%
Epoch 31/50, Train Loss: 2.1926, Train Acc: 28.97%, Test Loss: 2.1937, Test Acc: 28.73%
Epoch 32/50, Train Loss: 2.1802, Train Acc: 28.96%, Test Loss: 2.1814, Test Acc: 28.66%
Epoch 33/50, Train Loss: 2.1665, Train Acc: 28.88%, Test Loss: 2.1678, Test Acc: 28.52%
Epoch 34/50, Train Loss: 2.1515, Train Acc: 28.79%, Test Loss: 2.1530, Test Acc: 28.46%
Epoch 35/50, Train Loss: 2.1351, Train Acc: 28.74%, Test Loss: 2.1367, Test Acc: 28.29%
Epoch 36/50, Train Loss: 2.1174, Train Acc: 28.37%, Test Loss: 2.1192, Test Acc: 28.19%
Epoch 37/50, Train Loss: 2.0983, Train Acc: 28.27%, Test Loss: 2.1002, Test Acc: 28.03%
Epoch 38/50, Train Loss: 2.0765, Train Acc: 28.20%, Test Loss: 2.0786, Test Acc: 27.92%
Epoch 39/50, Train Loss: 2.0512, Train Acc: 27.90%, Test Loss: 2.0536, Test Acc: 27.53%
Epoch 40/50, Train Loss: 2.0243, Train Acc: 27.45%, Test Loss: 2.0269, Test Acc: 27.07%
Epoch 41/50, Train Loss: 1.9923, Train Acc: 27.33%, Test Loss: 1.9951, Test Acc: 26.84%
Epoch 42/50, Train Loss: 1.9555, Train Acc: 27.21%, Test Loss: 1.9587, Test Acc: 26.72%
Epoch 43/50, Train Loss: 1.9145, Train Acc: 26.96%, Test Loss: 1.9181, Test Acc: 26.42%
Epoch 44/50, Train Loss: 1.8693, Train Acc: 26.65%, Test Loss: 1.8735, Test Acc: 26.18%
Epoch 45/50, Train Loss: 1.8262, Train Acc: 27.00%, Test Loss: 1.8310, Test Acc: 26.77%
Epoch 46/50, Train Loss: 1.7902, Train Acc: 28.12%, Test Loss: 1.7956, Test Acc: 27.70%
Epoch 47/50, Train Loss: 1.7622, Train Acc: 28.86%, Test Loss: 1.7682, Test Acc: 28.34%
Epoch 48/50, Train Loss: 1.7409, Train Acc: 30.31%, Test Loss: 1.7473, Test Acc: 29.62%
Epoch 49/50, Train Loss: 1.7213, Train Acc: 31.35%, Test Loss: 1.7281, Test Acc: 30.97%
Epoch 50/50, Train Loss: 1.7030, Train Acc: 32.59%, Test Loss: 1.7102, Test Acc: 32.01%

Part 3: Convolutional Neural Networks and Image Classification¶

[Total marks for this part: 45 points]

The third part of this assignment is to demonstrate your basis knowledge in deep learning that you have acquired from the lectures and tutorials materials. Most of the contents in this assignment are drawn from the tutorials covered from weeks 3 to 6. Going through these materials before attempting this assignment is highly recommended.

The dataset used for this part is a specific dataset for this unit consisting of approximately $10,000$ images of $20$ classes of Animals, each of which has approximately 500 images. You can download the dataset at download here if you want to do your assignment on your machine.

In [ ]:
import os
import requests
import tarfile
import time
from torchvision import datasets, transforms
from torch.utils.data import DataLoader, random_split
import torchvision.models as models
import torch.nn as nn
import torch
import PIL.Image
import pathlib
from torchsummary import summary
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np

# check if CUDA is available
train_on_gpu = torch.cuda.is_available()

if not train_on_gpu:
    print('CUDA is not available.  Training on CPU ...')
else:
    print('CUDA is available!  Training on GPU ...')

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
torch.manual_seed(1234)
CUDA is not available.  Training on CPU ...
Out[ ]:
<torch._C.Generator at 0x225ffa394d0>

Download the dataset to the folder of this Google Colab.

In [ ]:
!gdown --fuzzy https://drive.google.com/file/d/1aEkxNWaD02Z8ZNvZzeMefUoY97C-3wTG/view?usp=drive_link
'gdown' is not recognized as an internal or external command,
operable program or batch file.

We unzip the dataset to the folder.

In [ ]:
!unzip -q Animals_Dataset.zip
In [ ]:
data_dir = "./FIT5215_Dataset"

# We resize the images to [3,64,64]
transform = transforms.Compose([transforms.Resize((64,64)),  #resises the image so it can be perfect for our model.
                                      transforms.RandomHorizontalFlip(), # FLips the image w.r.t horizontal axis
                                      #transforms.RandomRotation(4),     #Rotates the image to a specified angel
                                      #transforms.RandomAffine(0, shear=10, scale=(0.8,1.2)), #Performs actions like zooms, change shear angles.
                                      transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2), # Set the color params
                                      transforms.ToTensor(), # convert the image to tensor so that it can work with torch
                                      transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),  # Normalize the images, each R,G,B value is normalized with mean=0.5 and std=0.5
                                      ])


# Load the dataset using torchvision.datasets.ImageFolder and apply transformations
dataset = datasets.ImageFolder(data_dir, transform=transform)

# Split the dataset into training and validation sets
train_size = int(0.9 * len(dataset))
valid_size = len(dataset) - train_size
train_dataset, val_dataset = random_split(dataset, [train_size, valid_size])

# Example of DataLoader creation for training and validation
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)

print("Number of instance in train_set: %s" % len(train_dataset))
print("Number of instance in val_set: %s" % len(val_dataset))
Number of instance in train_set: 8519
Number of instance in val_set: 947
In [ ]:
class_names = ['bird', 'bottle', 'bread', 'butterfly', 'cake', 'cat', 'chicken', 'cow', 'dog', 'duck',
                  'elephant', 'fish', 'handgun', 'horse', 'lion', 'lipstick', 'seal', 'snake', 'spider', 'vase']
In [ ]:
# obtain one batch of training images
dataiter = iter(train_loader)
images, labels = next(dataiter)
images = images.numpy() # convert images to numpy for display
In [ ]:
import math

def imshow(img):
    img = img / 2 + 0.5  # unnormalize
    plt.imshow(np.transpose(img, (1, 2, 0)))  # convert from Tensor image

def visualize_data(images, categories, images_per_row = 8):
    class_names = ['bird', 'bottle', 'bread', 'butterfly', 'cake', 'cat', 'chicken', 'cow', 'dog', 'duck',
                  'elephant', 'fish', 'handgun', 'horse', 'lion', 'lipstick', 'seal', 'snake', 'spider', 'vase']
    n_images = len(images)
    n_rows = math.ceil(float(n_images)/images_per_row)
    fig = plt.figure(figsize=(1.5*images_per_row, 1.5*n_rows))
    fig.patch.set_facecolor('white')
    for i in range(n_images):
        plt.subplot(n_rows, images_per_row, i+1)
        plt.xticks([])
        plt.yticks([])
        imshow(images[i])
        class_index = categories[i]
        plt.xlabel(class_names[class_index])
    plt.show()
In [ ]:
visualize_data(images, labels)
No description has been provided for this image

For questions 3.1 to 3.7, you'll need to write your own model in a way that makes it easy for you to experiment with different architectures and parameters. The goal is to be able to pass the parameters to initialize a new instance of YourModel to build different network architectures with different parameters. Below are descriptions of some parameters for YourModel:

  1. Block confirguration: Our network consists of many blocks. Each block has the pattern [conv, batch norm, activation, conv, batch norm, activation, max pool, dropout]. All convolutional layers have filter size $(3, 3)$, strides $(1, 1)$ and padding = 1, and all max pool layers have strides $(2, 2)$, kernel size $2$, and padding = 0. The network will consists of a few blocks before applying a linear layer to output the logits for the softmax layer.

  2. list_feature_maps: the number of feature maps in the blocks of the network. For example, if list_feature_maps = [16, 32, 64], our network has three blocks with the input_channels or number of feature maps are 16, 32, and 64 respectively.

  3. drop_rate: the keep probability for dropout. Setting drop_rate to $0.0$ means not using dropout.

  4. batch_norm: the batch normalization function is used or not. Setting batch_norm to false means not using batch normalization.

  5. use_skip: the skip connection is used in the blocks or not. Setting this to true means that we use 1x1 Conv2D with strides=2 for the skip connection.

  6. At the end, you need to apply global average pooling (GAP) (AdaptiveAvgPool2d((1, 1))) to flatten the 3D output tensor before defining the output linear layer for predicting the labels.

Here is the model confirguration of YourCNN if the list_feature_maps = [16, 32, 64] and batch_norm = true.

image.png

Question 3.1: You need to implement the aforementioned CNN.

First, you need to implement the block of our CNN in the class YourBlock. You can ignore use_skip and skip connection for simplicity. However, you cannot earn full marks for this question.

[6 points]
In [ ]:
#Your code here
class YourBlock(nn.Module):
  def __init__(self, in_feature_maps, out_feature_maps, drop_rate = 0.2, batch_norm = True, use_skip = True):
    super(YourBlock, self).__init__()
    self.use_skip = use_skip
    #Your code here
    self.block = nn.Sequential(
            nn.Conv2d(in_feature_maps, out_feature_maps, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(out_feature_maps) if batch_norm else nn.Identity(),
            nn.ReLU(inplace=True),

            nn.Conv2d(out_feature_maps, out_feature_maps, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(out_feature_maps) if batch_norm else nn.Identity(),
            nn.ReLU(inplace=True),

            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Dropout(drop_rate)
        )

    if self.use_skip and in_feature_maps != out_feature_maps:
        self.skip_conv = nn.Conv2d(in_feature_maps, out_feature_maps, kernel_size=1, stride=2)
    else:
        self.skip_conv = None

  def forward(self, x):
    out = self.block(x)

    if self.use_skip:
      skip = x
      if self.skip_conv:
        skip = self.skip_conv(x)

      out += skip

    return out

Second, you need to use the above YourBlock to implement the class YourCNN.

[6 points]
In [ ]:
class YourCNN(nn.Module):
  def __init__(self, list_feature_maps = [16, 32, 64], drop_rate = 0.2, batch_norm= True, use_skip = True):
    super(YourCNN, self).__init__()
    layers = []
    #Write your code here

    layers.append(nn.Conv2d(3, list_feature_maps[0], kernel_size=3, stride=1, padding=1))
    if batch_norm:
        layers.append(nn.BatchNorm2d(list_feature_maps[0]))
    layers.append(nn.ReLU(inplace=True))

    in_channels = list_feature_maps[0]
    for out_channels in list_feature_maps[1:]:
        layers.append(YourBlock(in_channels, out_channels, drop_rate=drop_rate, batch_norm=batch_norm, use_skip=use_skip))
        in_channels = out_channels

    layers.append(nn.AdaptiveAvgPool2d((1, 1)))
    layers.append(nn.Flatten())
    layers.append(nn.Linear(list_feature_maps[-1], 20))

    self.block = nn.ModuleList(layers)


  def forward(self, x):
    #Write your code here
    for layer in self.block:
        x = layer(x)

    return x

We declare my_cnn from YourCNN as follows.

In [ ]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
my_cnn = YourCNN(list_feature_maps = [16, 32, 64], use_skip = True)
my_cnn = my_cnn.to(device)
print(my_cnn)
YourCNN(
  (block): ModuleList(
    (0): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): YourBlock(
      (block): Sequential(
        (0): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (5): ReLU(inplace=True)
        (6): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
        (7): Dropout(p=0.2, inplace=False)
      )
      (skip_conv): Conv2d(16, 32, kernel_size=(1, 1), stride=(2, 2))
    )
    (4): YourBlock(
      (block): Sequential(
        (0): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (5): ReLU(inplace=True)
        (6): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
        (7): Dropout(p=0.2, inplace=False)
      )
      (skip_conv): Conv2d(32, 64, kernel_size=(1, 1), stride=(2, 2))
    )
    (5): AdaptiveAvgPool2d(output_size=(1, 1))
    (6): Flatten(start_dim=1, end_dim=-1)
    (7): Linear(in_features=64, out_features=20, bias=True)
  )
)

We declare the optimizer and the loss function.

In [ ]:
# Loss and optimizer
learning_rate = 0.001
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(my_cnn.parameters(), lr=learning_rate)

Here are the codes to compute the loss and accuracy.

In [ ]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

def compute_loss(model, loss_fn, loader):
  loss = 0
  # Set model to eval mode for inference
  model.eval()
  with torch.no_grad():  # No need to track gradients for validation
    for (batchX, batchY) in loader:
      # Move data to the same device as the model
      batchX, batchY = batchX.to(device).type(torch.float32), batchY.to(device).type(torch.long)
      loss += loss_fn(model(batchX), batchY)
  # Set model back to train mode
  model.train()
  return float(loss)/len(loader)
In [ ]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

def compute_acc(model, loader):
    correct = 0
    totals = 0
    # Set model to eval mode for inference
    model.eval()
    for (batchX, batchY) in loader:
        # Move batchX and batchY to the same device as the model
        batchX, batchY = batchX.to(device).type(torch.float32), batchY.to(device)
        outputs = model(batchX)  # feed batch to the model
        totals += batchY.size(0)  # accumulate totals with the current batch size
        predicted = torch.argmax(outputs.data, 1)  # get the predicted class
        # Move batchY to the same device as predicted for comparison
        correct += (predicted == batchY).sum().item()
    return correct / totals

Here is the code to train our model.

In [ ]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

def fit(model= None, train_loader = None, valid_loader= None, optimizer = None,
        num_epochs = 50, verbose = True, seed= 1234):
  torch.manual_seed(seed)
  # Move the model to the device before initializing the optimizer
  model.to(device) # Move the model to the GPU

  if optimizer == None:
    optim = torch.optim.Adam(model.parameters(), lr = 0.001) # Now initialize optimizer with model on GPU
  else:
    optim = optimizer
  history = dict()
  history['val_loss'] = list()
  history['val_acc'] = list()
  history['train_loss'] = list()
  history['train_acc'] = list()

  for epoch in range(num_epochs):
    model.train()
    for (X, y) in train_loader:
      # Move input data to the same device as the model
      X,y = X.to(device), y.to(device)
      # Forward pass
      outputs = model(X.type(torch.float32)) # X is already on the correct device
      loss = loss_fn(outputs, y.type(torch.long))
      # Backward and optimize
      optim.zero_grad()
      loss.backward()
      optim.step()
    #losses and accuracies for epoch
    val_loss = compute_loss(model, loss_fn, valid_loader)
    val_acc = compute_acc(model, valid_loader)
    train_loss = compute_loss(model, loss_fn, train_loader)
    train_acc = compute_acc(model, train_loader)
    history['val_loss'].append(val_loss)
    history['val_acc'].append(val_acc)
    history['train_loss'].append(train_loss)
    history['train_acc'].append(train_acc)
    if not verbose: #verbose = True means we do show the training information during training
      print(f"Epoch {epoch+1}/{num_epochs}")
      print(f"train loss= {train_loss:.4f} - train acc= {train_acc*100:.2f}% - valid loss= {val_loss:.4f} - valid acc= {val_acc*100:.2f}%")
  return history
In [ ]:
history = fit(model= my_cnn, train_loader=train_loader, valid_loader = val_loader, optimizer = optimizer, num_epochs= 10, verbose = False)
Epoch 1/10
train loss= 2.2512 - train acc= 31.25% - valid loss= 2.2757 - valid acc= 29.78%
Epoch 2/10
train loss= 2.0052 - train acc= 38.40% - valid loss= 2.0321 - valid acc= 36.96%
Epoch 3/10
train loss= 1.9124 - train acc= 39.52% - valid loss= 1.9798 - valid acc= 36.96%
Epoch 4/10
train loss= 1.7813 - train acc= 45.35% - valid loss= 1.8076 - valid acc= 42.66%
Epoch 5/10
train loss= 1.7268 - train acc= 45.77% - valid loss= 1.7672 - valid acc= 41.39%
Epoch 6/10
train loss= 1.6529 - train acc= 48.26% - valid loss= 1.6942 - valid acc= 47.73%
Epoch 7/10
train loss= 1.5946 - train acc= 49.65% - valid loss= 1.5874 - valid acc= 49.95%
Epoch 8/10
train loss= 1.5603 - train acc= 50.25% - valid loss= 1.5942 - valid acc= 49.21%
Epoch 9/10
train loss= 1.4616 - train acc= 53.23% - valid loss= 1.4957 - valid acc= 52.06%
Epoch 10/10
train loss= 1.6078 - train acc= 50.29% - valid loss= 1.6187 - valid acc= 46.67%

Question 3.2: Now, let us tune the number of blocks $use\_skip \in \{true,false\}$ and $learning\_rate \in \{0.001, 0.0005\}$. Write your code for this tuning and report the result of the best model on the testing set. Note that you need to show your code for tuning and evaluating on the test set to earn the full marks. During tuning, you can set the instance variable verbose of your model to True for not showing the training details of each epoch.

Note that for this question, depending on your computational resource, you can choose list_feature_maps= [32, 64] or list_feature_maps= [16, 32, 64].

[3 points]
In [ ]:
#Your code here
import itertools
list_feature_maps = [16, 32, 64]
possible_lr = [0.001, 0.0005]
possible_skip = [True, False]

best_accuracy = 0
best_params = []

for lr, skip in itertools.product(possible_lr, possible_skip):
    my_cnn = YourCNN(list_feature_maps = list_feature_maps, use_skip = skip)
    optimizer = torch.optim.Adam(my_cnn.parameters(), lr=lr)
    history = fit(model= my_cnn, train_loader=train_loader, valid_loader = val_loader, optimizer = optimizer, num_epochs= 10, verbose = True)
    accuracy = max(history['val_acc'])
    print(f'lr: {lr}, skip: {skip}, Accuracy: {accuracy * 100:.2f}%')

    if accuracy > best_accuracy:
        best_accuracy = accuracy
        best_params = [lr, skip]

print(f'Best Params: lr = {best_params[0]}, skip = {best_params[1]}')
lr: 0.001, skip: True, Accuracy: 53.54%
lr: 0.001, skip: False, Accuracy: 58.61%
lr: 0.0005, skip: True, Accuracy: 51.64%
lr: 0.0005, skip: False, Accuracy: 53.54%
Best Params: lr = 0.001, skip = False

Please note that you are struggling in implementing the aforementioned CNN. You can use the MiniVGG network in our labs for doing the following questions. However, you cannot earn any mark for 3.1 and 3.2.

Question 3.3: Exploring Data Mixup Technique for Improving Generalization Ability.

[4 points]

Data mixup is another super-simple technique used to boost the generalization ability of deep learning models. You need to incoroporate data mixup technique to the above deep learning model and experiment its performance. There are some papers and documents for data mixup as follows:

  • Main paper for data mixup link for main paper and a good article article link.

You need to extend your model developed above, train a model using data mixup, and write your observations and comments about the result.

In [ ]:
#Your code here
from torch.autograd import Variable
import numpy

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

def mixup_fit(model= None, train_loader = None, valid_loader= None, optimizer = None,
        num_epochs = 50, verbose = True, seed= 1234):
  torch.manual_seed(seed)
  # Move the model to the device before initializing the optimizer
  model.to(device) # Move the model to the GPU

  if optimizer == None:
    optim = torch.optim.Adam(model.parameters(), lr = 0.001) # Now initialize optimizer with model on GPU
  else:
    optim = optimizer
  history = dict()
  history['val_loss'] = list()
  history['val_acc'] = list()
  history['train_loss'] = list()
  history['train_acc'] = list()
  alpha = 0.4

  for epoch in range(num_epochs):
    model.train()
    for (X, y) in train_loader:
      # Move input data to the same device as the model
      X,y = X.to(device), y.to(device)
      index = torch.randperm(X.size(0))
      X2 = X[index]
      y2 = y[index]

      lam = numpy.random.beta(alpha+1, alpha)
      lam = torch.tensor(lam, dtype=torch.float32, device=device)
      x3 = lam * X + (1. - lam) * X2
      y3 = lam * y + (1. - lam) * y2
      optim.zero_grad()
      loss_fn(model(x3), y3.type(torch.long)).backward()
      optim.step()

      outputs = model(X.type(torch.float32))
      loss = loss_fn(outputs, y.type(torch.long))
      optim.zero_grad()
      loss.backward()
      optim.step()

    #losses and accuracies for epoch
    val_loss = compute_loss(model, loss_fn, valid_loader)
    val_acc = compute_acc(model, valid_loader)
    train_loss = compute_loss(model, loss_fn, train_loader)
    train_acc = compute_acc(model, train_loader)
    history['val_loss'].append(val_loss)
    history['val_acc'].append(val_acc)
    history['train_loss'].append(train_loss)
    history['train_acc'].append(train_acc)
    if not verbose: #verbose = True means we do show the training information during training
      print(f"Epoch {epoch+1}/{num_epochs}")
      print(f"train loss= {train_loss:.4f} - train acc= {train_acc*100:.2f}% - valid loss= {val_loss:.4f} - valid acc= {val_acc*100:.2f}%")
  return history


learning_rate = 0.001
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(my_cnn.parameters(), lr=learning_rate)
history = mixup_fit(model= my_cnn, train_loader=train_loader, valid_loader = val_loader, optimizer = optimizer, num_epochs= 10, verbose = False)

"""
Comments about MixUp augmentation:

It works fairly well, it trains the model faster and more accurately compared to the standard training using the same CNN structure. Overall, its good!
"""
Epoch 1/10
train loss= 1.7343 - train acc= 49.83% - valid loss= 1.7619 - valid acc= 49.63%
Epoch 2/10
train loss= 1.6524 - train acc= 52.33% - valid loss= 1.7221 - valid acc= 50.90%
Epoch 3/10
train loss= 1.6130 - train acc= 55.90% - valid loss= 1.6964 - valid acc= 52.59%
Epoch 4/10
train loss= 1.5862 - train acc= 55.69% - valid loss= 1.6637 - valid acc= 53.01%
Epoch 5/10
train loss= 1.5298 - train acc= 56.67% - valid loss= 1.6084 - valid acc= 55.12%
Epoch 6/10
train loss= 1.4831 - train acc= 57.04% - valid loss= 1.5695 - valid acc= 52.16%
Epoch 7/10
train loss= 1.4670 - train acc= 59.60% - valid loss= 1.5300 - valid acc= 58.18%
Epoch 8/10
train loss= 1.3703 - train acc= 62.47% - valid loss= 1.4526 - valid acc= 58.50%
Epoch 9/10
train loss= 1.4241 - train acc= 61.51% - valid loss= 1.5054 - valid acc= 58.18%
Epoch 10/10
train loss= 1.3757 - train acc= 61.24% - valid loss= 1.4717 - valid acc= 58.39%

Question 3.4: Exploring CutMix Technique for Improving Generalization Ability.

[4 points]

There are some papers and documents for CutMix as follows:

  • Main paper for Cutmix link for main paper and a good article article link.

You need to extend your model developed above, train a model using CutMix, and write your observations and comments about the result.

In [ ]:
#Your code here
from torch.autograd import Variable
import numpy

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

def shuffle_minibatch(X, y):
  index = torch.randperm(X.size(0))
  return X[index], y[index], index

def cutmix_fit(model= None, train_loader = None, valid_loader= None, optimizer = None,
        num_epochs = 50, verbose = True, seed= 1234):
  torch.manual_seed(seed)
  # Move the model to the device before initializing the optimizer
  model.to(device) # Move the model to the GPU

  if optimizer == None:
    optim = torch.optim.Adam(model.parameters(), lr = 0.001) # Now initialize optimizer with model on GPU
  else:
    optim = optimizer
  history = dict()
  history['val_loss'] = list()
  history['val_acc'] = list()
  history['train_loss'] = list()
  history['train_acc'] = list()

  for epoch in range(num_epochs):
    model.train()
    for (X, y) in train_loader:
      # Move input data to the same device as the model

      X, y = X.to(device), y.to(device)
      input_s, target_s, index = shuffle_minibatch(X, y)
      lam = numpy.random.uniform(0, 1)
      lam = torch.tensor(lam, dtype=torch.float32, device=device)
      W = input_s.size(2)
      H = input_s.size(3)

      r_x = numpy.random.randint(0, W)
      r_y = numpy.random.randint(0, H)

      r_x, r_y = torch.tensor(r_x, dtype=torch.float32, device=device), torch.tensor(r_y, dtype=torch.float32, device=device)

      r_w = W * torch.sqrt(1 - lam)
      r_h = H * torch.sqrt(1 - lam)
      x1 = np.clip(r_x - r_w / 2, 0, W).long()
      x2 = np.clip(r_x + r_w / 2, 0, W).long()
      y1 = np.clip(r_y - r_h / 2, 0, H).long()
      y2 = np.clip(r_y + r_h / 2, 0, H).long()
      # print("x1", x1, "x2", x2, "y1", y1, "y2", y2)
      X[:, :, x1:x2, y1:y2] = input_s[index, :, x1:x2, y1:y2]
      lam = 1 - (x2 - x1) * (y2 - y1) / (W * H)
      # target_X = lam * y + (1. - lam) * target_s

      optim.zero_grad()
      output = model(X)
      loss = loss_fn(output, y.type(torch.long)) * lam + loss_fn(output, target_s.type(torch.long)) * (1. - lam)
      # loss_fn(model(input_X), target_X.type(torch.long)).backward()
      loss.backward()
      optim.step()


    #losses and accuracies for epoch
    val_loss = compute_loss(model, loss_fn, valid_loader)
    val_acc = compute_acc(model, valid_loader)
    train_loss = compute_loss(model, loss_fn, train_loader)
    train_acc = compute_acc(model, train_loader)
    history['val_loss'].append(val_loss)
    history['val_acc'].append(val_acc)
    history['train_loss'].append(train_loss)
    history['train_acc'].append(train_acc)
    if not verbose: #verbose = True means we do show the training information during training
      print(f"Epoch {epoch+1}/{num_epochs}")
      print(f"train loss= {train_loss:.4f} - train acc= {train_acc*100:.2f}% - valid loss= {val_loss:.4f} - valid acc= {val_acc*100:.2f}%")
  return history


learning_rate = 0.001
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(my_cnn.parameters(), lr=learning_rate)

history = cutmix_fit(model= my_cnn, train_loader=train_loader, valid_loader = val_loader, optimizer = optimizer, num_epochs= 80, verbose = False)

"""
Using the CutMix augmentation, the model can be trained to a higher accuracy compared to the standard training. However, it takes a lot more epochs to train the model to a higher accuracy as the standard training.
"""
Epoch 1/80
train loss= 0.8711 - train acc= 77.74% - valid loss= 1.1643 - valid acc= 64.41%
Epoch 2/80
train loss= 0.8587 - train acc= 79.73% - valid loss= 1.1338 - valid acc= 65.89%
Epoch 3/80
train loss= 0.9870 - train acc= 76.63% - valid loss= 1.2009 - valid acc= 66.21%
Epoch 4/80
train loss= 0.9812 - train acc= 76.28% - valid loss= 1.2216 - valid acc= 66.31%
Epoch 5/80
train loss= 0.9003 - train acc= 80.23% - valid loss= 1.1526 - valid acc= 70.22%
Epoch 6/80
train loss= 0.9952 - train acc= 76.01% - valid loss= 1.2617 - valid acc= 65.58%
Epoch 7/80
train loss= 1.0911 - train acc= 73.04% - valid loss= 1.3025 - valid acc= 63.78%
Epoch 8/80
train loss= 1.0037 - train acc= 76.84% - valid loss= 1.2199 - valid acc= 68.53%
Epoch 9/80
train loss= 0.9646 - train acc= 76.49% - valid loss= 1.1577 - valid acc= 68.11%
Epoch 10/80
train loss= 1.0882 - train acc= 74.09% - valid loss= 1.2894 - valid acc= 64.41%
Epoch 11/80
train loss= 0.9774 - train acc= 77.81% - valid loss= 1.1792 - valid acc= 67.16%
Epoch 12/80
train loss= 0.8531 - train acc= 79.43% - valid loss= 1.1033 - valid acc= 69.90%
Epoch 13/80
train loss= 0.8802 - train acc= 80.67% - valid loss= 1.1065 - valid acc= 69.90%
Epoch 14/80
train loss= 0.9678 - train acc= 75.24% - valid loss= 1.2131 - valid acc= 65.79%
Epoch 15/80
train loss= 0.9945 - train acc= 79.34% - valid loss= 1.2208 - valid acc= 68.22%
Epoch 16/80
train loss= 0.9218 - train acc= 79.46% - valid loss= 1.1512 - valid acc= 67.90%
Epoch 17/80
train loss= 0.9104 - train acc= 77.85% - valid loss= 1.1257 - valid acc= 70.43%
Epoch 18/80
train loss= 1.0148 - train acc= 77.98% - valid loss= 1.2241 - valid acc= 67.90%
Epoch 19/80
train loss= 0.8911 - train acc= 78.87% - valid loss= 1.1162 - valid acc= 69.06%
Epoch 20/80
train loss= 0.9060 - train acc= 78.37% - valid loss= 1.1496 - valid acc= 66.53%
Epoch 21/80
train loss= 0.9992 - train acc= 76.77% - valid loss= 1.2242 - valid acc= 66.74%
Epoch 22/80
train loss= 0.9816 - train acc= 77.65% - valid loss= 1.2022 - valid acc= 68.95%
Epoch 23/80
train loss= 0.9108 - train acc= 80.03% - valid loss= 1.1254 - valid acc= 69.80%
Epoch 24/80
train loss= 0.8930 - train acc= 79.25% - valid loss= 1.1145 - valid acc= 70.86%
Epoch 25/80
train loss= 0.8774 - train acc= 78.95% - valid loss= 1.1070 - valid acc= 68.53%
Epoch 26/80
train loss= 0.9006 - train acc= 75.77% - valid loss= 1.1422 - valid acc= 67.58%
Epoch 27/80
train loss= 1.0019 - train acc= 77.71% - valid loss= 1.2281 - valid acc= 69.90%
Epoch 28/80
train loss= 0.9338 - train acc= 77.46% - valid loss= 1.1768 - valid acc= 66.53%
Epoch 29/80
train loss= 0.8351 - train acc= 80.44% - valid loss= 1.0925 - valid acc= 70.12%
Epoch 30/80
train loss= 0.8644 - train acc= 79.94% - valid loss= 1.1020 - valid acc= 70.33%
Epoch 31/80
train loss= 1.0701 - train acc= 77.20% - valid loss= 1.2743 - valid acc= 66.10%
Epoch 32/80
train loss= 0.8879 - train acc= 79.36% - valid loss= 1.1326 - valid acc= 70.12%
Epoch 33/80
train loss= 0.8898 - train acc= 79.28% - valid loss= 1.1566 - valid acc= 67.58%
Epoch 34/80
train loss= 0.9624 - train acc= 78.84% - valid loss= 1.1762 - valid acc= 69.69%
Epoch 35/80
train loss= 0.8140 - train acc= 79.36% - valid loss= 1.0803 - valid acc= 67.58%
Epoch 36/80
train loss= 0.8990 - train acc= 81.12% - valid loss= 1.1202 - valid acc= 69.27%
Epoch 37/80
train loss= 1.0087 - train acc= 76.92% - valid loss= 1.2245 - valid acc= 66.31%
Epoch 38/80
train loss= 1.0057 - train acc= 77.90% - valid loss= 1.2117 - valid acc= 69.17%
Epoch 39/80
train loss= 0.9369 - train acc= 80.55% - valid loss= 1.1385 - valid acc= 71.91%
Epoch 40/80
train loss= 0.9937 - train acc= 77.45% - valid loss= 1.2090 - valid acc= 67.79%
Epoch 41/80
train loss= 0.8543 - train acc= 79.94% - valid loss= 1.0937 - valid acc= 70.33%
Epoch 42/80
train loss= 0.8819 - train acc= 81.02% - valid loss= 1.1353 - valid acc= 70.64%
Epoch 43/80
train loss= 0.8894 - train acc= 78.24% - valid loss= 1.1144 - valid acc= 69.59%
Epoch 44/80
train loss= 0.8462 - train acc= 80.34% - valid loss= 1.0823 - valid acc= 71.59%
Epoch 45/80
train loss= 0.8358 - train acc= 79.47% - valid loss= 1.0805 - valid acc= 70.01%
Epoch 46/80
train loss= 0.8904 - train acc= 80.81% - valid loss= 1.1253 - valid acc= 69.59%
Epoch 47/80
train loss= 0.8016 - train acc= 79.45% - valid loss= 1.0625 - valid acc= 70.86%
Epoch 48/80
train loss= 0.8905 - train acc= 81.18% - valid loss= 1.1344 - valid acc= 71.17%
Epoch 49/80
train loss= 0.8608 - train acc= 80.02% - valid loss= 1.1091 - valid acc= 70.22%
Epoch 50/80
train loss= 0.8978 - train acc= 81.21% - valid loss= 1.1262 - valid acc= 71.38%
Epoch 51/80
train loss= 0.8333 - train acc= 82.29% - valid loss= 1.0750 - valid acc= 73.92%
Epoch 52/80
train loss= 0.8985 - train acc= 80.03% - valid loss= 1.1435 - valid acc= 69.38%
Epoch 53/80
train loss= 0.9486 - train acc= 78.59% - valid loss= 1.1879 - valid acc= 68.32%
Epoch 54/80
train loss= 0.8356 - train acc= 80.19% - valid loss= 1.0665 - valid acc= 68.85%
Epoch 55/80
train loss= 0.7872 - train acc= 80.64% - valid loss= 1.0552 - valid acc= 70.54%
Epoch 56/80
train loss= 0.8538 - train acc= 80.51% - valid loss= 1.1000 - valid acc= 70.12%
Epoch 57/80
train loss= 0.8711 - train acc= 81.24% - valid loss= 1.1136 - valid acc= 71.07%
Epoch 58/80
train loss= 0.8373 - train acc= 81.54% - valid loss= 1.0925 - valid acc= 71.38%
Epoch 59/80
train loss= 0.8473 - train acc= 83.41% - valid loss= 1.0752 - valid acc= 74.55%
Epoch 60/80
train loss= 0.8842 - train acc= 81.05% - valid loss= 1.1741 - valid acc= 69.48%
Epoch 61/80
train loss= 0.8928 - train acc= 80.89% - valid loss= 1.1180 - valid acc= 70.64%
Epoch 62/80
train loss= 0.8561 - train acc= 81.03% - valid loss= 1.0980 - valid acc= 70.01%
Epoch 63/80
train loss= 0.8423 - train acc= 79.36% - valid loss= 1.0972 - valid acc= 69.06%
Epoch 64/80
train loss= 0.9358 - train acc= 78.98% - valid loss= 1.1997 - valid acc= 67.69%
Epoch 65/80
train loss= 0.8280 - train acc= 81.93% - valid loss= 1.0721 - valid acc= 71.49%
Epoch 66/80
train loss= 0.8246 - train acc= 81.75% - valid loss= 1.0682 - valid acc= 71.49%
Epoch 67/80
train loss= 0.7978 - train acc= 81.43% - valid loss= 1.0439 - valid acc= 70.43%
Epoch 68/80
train loss= 0.8422 - train acc= 77.90% - valid loss= 1.1083 - valid acc= 67.37%
Epoch 69/80
train loss= 0.9934 - train acc= 79.05% - valid loss= 1.2531 - valid acc= 68.53%
Epoch 70/80
train loss= 0.8659 - train acc= 78.31% - valid loss= 1.1048 - valid acc= 69.06%
Epoch 71/80
train loss= 0.8386 - train acc= 82.83% - valid loss= 1.0691 - valid acc= 72.65%
Epoch 72/80
train loss= 0.7985 - train acc= 81.63% - valid loss= 1.0632 - valid acc= 70.12%
Epoch 73/80
train loss= 0.9713 - train acc= 78.78% - valid loss= 1.1839 - valid acc= 69.48%
Epoch 74/80
train loss= 0.8781 - train acc= 81.01% - valid loss= 1.1164 - valid acc= 71.38%
Epoch 75/80
train loss= 0.7309 - train acc= 82.15% - valid loss= 1.0049 - valid acc= 72.02%
Epoch 76/80
train loss= 0.8841 - train acc= 83.13% - valid loss= 1.1189 - valid acc= 72.54%
Epoch 77/80
train loss= 0.8610 - train acc= 80.26% - valid loss= 1.1227 - valid acc= 69.27%
Epoch 78/80
train loss= 0.8132 - train acc= 81.93% - valid loss= 1.0593 - valid acc= 72.86%
Epoch 79/80
train loss= 0.8810 - train acc= 82.09% - valid loss= 1.1414 - valid acc= 71.38%
Epoch 80/80
train loss= 0.8059 - train acc= 79.63% - valid loss= 1.0694 - valid acc= 70.33%

Question 3.5: Implement the one-versus-all (OVA) loss. The details are as follows:

  • You need to apply the sigmoid activation function to logits $h = [h_1, h_2,...,h_M]$ instead of the softmax activation function as usual to obtain $p = [p_1, p_2,...,p_M]$, meaning that $p_i = sigmoid(h_i), i=1,...,M$. Note that $M$ is the number of classes.
  • Given a data example $x$ with the ground-truth label $y$, the idea is to maximize the likelihood $p_y$ and to minimize the likelihoods $p_i, i \neq y$. Therefore, the objective function is to find the model parameters to
    • $\max\left\{ \log p_{y}+\sum_{i\neq y}\log(1-p_{i})\right\}$ or equivalently $\min\left\{ -\log p_{y}-\sum_{i\neq y}\log(1-p_{i})\right\}$.
    • For example, if $M=3$ and $y=2$, you need to minimize $\min\left\{ -\log(1-p_{1})-\log p_{2}-\log(1-p_{3})\right\}$.

Compare the model trained with the OVA loss and the same model trained with the standard cross-entropy loss.

[4 points]
In [ ]:
import torch.nn.functional as F
def ova_loss(logits, target, num_classes=20):
    probs = torch.sigmoid(logits)
    target_one_hot = F.one_hot(target, num_classes=num_classes).float()

    pos_loss = -torch.log(probs + 1e-8) * target_one_hot
    neg_loss = -torch.log(1 - probs + 1e-8) * (1 - target_one_hot)

    loss = pos_loss.sum(dim=1) + neg_loss.sum(dim=1)
    return loss.sum()
In [ ]:
#Your code here
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

def ova_fit(model= None, train_loader = None, valid_loader= None, optimizer = None,
        num_epochs = 50, verbose = True, seed= 1234, loss_fn = ova_loss):
  torch.manual_seed(seed)
  # Move the model to the device before initializing the optimizer
  model.to(device) # Move the model to the GPU

  if optimizer == None:
    optim = torch.optim.Adam(model.parameters(), lr = 0.001) # Now initialize optimizer with model on GPU
  else:
    optim = optimizer
  history = dict()
  history['val_loss'] = list()
  history['val_acc'] = list()
  history['train_loss'] = list()
  history['train_acc'] = list()

  for epoch in range(num_epochs):
    model.train()
    for (X, y) in train_loader:
      # Move input data to the same device as the model
      X,y = X.to(device), y.to(device)
      # Forward pass
      outputs = model(X.type(torch.float32)) # X is already on the correct device
      loss = loss_fn(outputs, y.type(torch.long))
      # Backward and optimize
      optim.zero_grad()
      loss.backward()
      optim.step()
    #losses and accuracies for epoch
    val_loss = compute_loss(model, loss_fn, valid_loader)
    val_acc = compute_acc(model, valid_loader)
    train_loss = compute_loss(model, loss_fn, train_loader)
    train_acc = compute_acc(model, train_loader)
    history['val_loss'].append(val_loss)
    history['val_acc'].append(val_acc)
    history['train_loss'].append(train_loss)
    history['train_acc'].append(train_acc)
    if not verbose: #verbose = True means we do show the training information during training
      print(f"Epoch {epoch+1}/{num_epochs}")
      print(f"train loss= {train_loss:.4f} - train acc= {train_acc*100:.2f}% - valid loss= {val_loss:.4f} - valid acc= {val_acc*100:.2f}%")
  return history

history = ova_fit(model= my_cnn, train_loader=train_loader, valid_loader = val_loader, optimizer = optimizer, num_epochs= 80, verbose = False)

"""
To compare the performance of the OVA loss function and Cross-Entropy loss function, we can notice that the loss on average is higher on OVA, but the accuracy is higher as well. This is because the OVA loss function is more sensitive to the misclassification of the positive class.
The OVA loss function is more suitable for imbalanced datasets, where the positive class is underrepresented. In this case, the OVA loss function is more effective than the Cross-Entropy loss function.
However, we can notice that the training accuracy is around 90% and the validation accuracy is around 70%. This indicates that the model is overfitting the training data when using the OVA loss function.
"""
Epoch 1/80
train loss= 47.1366 - train acc= 75.51% - valid loss= 57.8948 - valid acc= 67.90%
Epoch 2/80
train loss= 42.1836 - train acc= 81.01% - valid loss= 55.3828 - valid acc= 71.70%
Epoch 3/80
train loss= 39.3105 - train acc= 83.12% - valid loss= 52.9059 - valid acc= 72.76%
Epoch 4/80
train loss= 39.9245 - train acc= 82.70% - valid loss= 54.4956 - valid acc= 71.17%
Epoch 5/80
train loss= 37.1610 - train acc= 84.43% - valid loss= 51.7609 - valid acc= 73.60%
Epoch 6/80
train loss= 36.0386 - train acc= 85.77% - valid loss= 50.6429 - valid acc= 74.97%
Epoch 7/80
train loss= 35.2150 - train acc= 85.93% - valid loss= 51.9660 - valid acc= 74.23%
Epoch 8/80
train loss= 38.0244 - train acc= 84.51% - valid loss= 54.4669 - valid acc= 74.55%
Epoch 9/80
train loss= 32.4042 - train acc= 87.50% - valid loss= 49.1709 - valid acc= 75.40%
Epoch 10/80
train loss= 34.7495 - train acc= 86.20% - valid loss= 53.1989 - valid acc= 72.12%
Epoch 11/80
train loss= 35.5435 - train acc= 86.42% - valid loss= 52.7807 - valid acc= 74.87%
Epoch 12/80
train loss= 32.8508 - train acc= 86.97% - valid loss= 50.8258 - valid acc= 74.55%
Epoch 13/80
train loss= 35.7059 - train acc= 87.85% - valid loss= 54.5461 - valid acc= 74.87%
Epoch 14/80
train loss= 34.1838 - train acc= 87.62% - valid loss= 52.0642 - valid acc= 76.56%
Epoch 15/80
train loss= 33.4355 - train acc= 87.86% - valid loss= 51.1606 - valid acc= 74.76%
Epoch 16/80
train loss= 32.5948 - train acc= 87.26% - valid loss= 51.5082 - valid acc= 75.29%
Epoch 17/80
train loss= 33.4538 - train acc= 88.81% - valid loss= 49.2051 - valid acc= 77.82%
Epoch 18/80
train loss= 36.5964 - train acc= 85.08% - valid loss= 57.8373 - valid acc= 71.81%
Epoch 19/80
train loss= 31.5689 - train acc= 88.18% - valid loss= 51.6951 - valid acc= 74.87%
Epoch 20/80
train loss= 33.0535 - train acc= 88.16% - valid loss= 53.7252 - valid acc= 74.87%
Epoch 21/80
train loss= 31.8304 - train acc= 88.91% - valid loss= 51.8301 - valid acc= 74.97%
Epoch 22/80
train loss= 33.7566 - train acc= 87.98% - valid loss= 53.8298 - valid acc= 75.29%
Epoch 23/80
train loss= 30.8250 - train acc= 89.11% - valid loss= 51.0864 - valid acc= 75.71%
Epoch 24/80
train loss= 36.1893 - train acc= 87.48% - valid loss= 56.0976 - valid acc= 75.50%
Epoch 25/80
train loss= 38.0681 - train acc= 87.17% - valid loss= 58.2242 - valid acc= 74.97%
Epoch 26/80
train loss= 33.2921 - train acc= 88.44% - valid loss= 53.5809 - valid acc= 75.92%
Epoch 27/80
train loss= 31.1200 - train acc= 89.12% - valid loss= 49.8280 - valid acc= 75.71%
Epoch 28/80
train loss= 29.5416 - train acc= 89.62% - valid loss= 50.0017 - valid acc= 75.82%
Epoch 29/80
train loss= 30.9798 - train acc= 90.19% - valid loss= 52.1469 - valid acc= 75.08%
Epoch 30/80
train loss= 29.7872 - train acc= 89.11% - valid loss= 49.8789 - valid acc= 74.23%
Epoch 31/80
train loss= 29.5650 - train acc= 90.34% - valid loss= 51.8247 - valid acc= 75.92%
Epoch 32/80
train loss= 29.7837 - train acc= 89.54% - valid loss= 50.1356 - valid acc= 77.09%
Epoch 33/80
train loss= 29.9746 - train acc= 88.31% - valid loss= 54.3849 - valid acc= 72.54%
Epoch 34/80
train loss= 32.7018 - train acc= 89.44% - valid loss= 56.3692 - valid acc= 74.55%
Epoch 35/80
train loss= 31.4433 - train acc= 89.65% - valid loss= 56.2828 - valid acc= 76.35%
Epoch 36/80
train loss= 30.7911 - train acc= 90.16% - valid loss= 53.7645 - valid acc= 76.77%
Epoch 37/80
train loss= 26.8047 - train acc= 90.26% - valid loss= 51.1433 - valid acc= 74.87%
Epoch 38/80
train loss= 32.5631 - train acc= 89.58% - valid loss= 54.2892 - valid acc= 76.03%
Epoch 39/80
train loss= 31.2332 - train acc= 89.60% - valid loss= 52.3458 - valid acc= 75.92%
Epoch 40/80
train loss= 32.6994 - train acc= 89.46% - valid loss= 57.3138 - valid acc= 74.55%
Epoch 41/80
train loss= 30.0134 - train acc= 89.17% - valid loss= 53.6181 - valid acc= 76.03%
Epoch 42/80
train loss= 29.6204 - train acc= 90.98% - valid loss= 53.6733 - valid acc= 75.92%
Epoch 43/80
train loss= 29.2270 - train acc= 89.89% - valid loss= 54.2264 - valid acc= 76.35%
Epoch 44/80
train loss= 30.1961 - train acc= 88.54% - valid loss= 53.7974 - valid acc= 74.45%
Epoch 45/80
train loss= 28.5713 - train acc= 91.16% - valid loss= 52.8153 - valid acc= 76.03%
Epoch 46/80
train loss= 32.9081 - train acc= 90.26% - valid loss= 58.6334 - valid acc= 74.76%
Epoch 47/80
train loss= 30.0000 - train acc= 90.98% - valid loss= 54.2082 - valid acc= 77.51%
Epoch 48/80
train loss= 33.3750 - train acc= 87.92% - valid loss= 58.9772 - valid acc= 71.38%
Epoch 49/80
train loss= 30.8531 - train acc= 88.86% - valid loss= 55.8187 - valid acc= 73.81%
Epoch 50/80
train loss= 29.5917 - train acc= 90.29% - valid loss= 53.8659 - valid acc= 74.23%
Epoch 51/80
train loss= 32.2624 - train acc= 89.67% - valid loss= 55.7454 - valid acc= 72.44%
Epoch 52/80
train loss= 26.6029 - train acc= 91.30% - valid loss= 51.6427 - valid acc= 75.40%
Epoch 53/80
train loss= 25.1188 - train acc= 91.37% - valid loss= 50.4571 - valid acc= 74.02%
Epoch 54/80
train loss= 35.0702 - train acc= 90.23% - valid loss= 61.4567 - valid acc= 74.66%
Epoch 55/80
train loss= 29.1653 - train acc= 90.81% - valid loss= 55.1056 - valid acc= 74.97%
Epoch 56/80
train loss= 26.5325 - train acc= 91.88% - valid loss= 51.4362 - valid acc= 77.09%
Epoch 57/80
train loss= 25.3047 - train acc= 92.09% - valid loss= 50.5529 - valid acc= 76.45%
Epoch 58/80
train loss= 27.5191 - train acc= 91.20% - valid loss= 54.8270 - valid acc= 74.87%
Epoch 59/80
train loss= 28.1340 - train acc= 91.71% - valid loss= 55.4791 - valid acc= 74.87%
Epoch 60/80
train loss= 28.0944 - train acc= 92.33% - valid loss= 54.1611 - valid acc= 75.82%
Epoch 61/80
train loss= 29.0209 - train acc= 91.13% - valid loss= 54.9859 - valid acc= 74.76%
Epoch 62/80
train loss= 29.9838 - train acc= 91.89% - valid loss= 58.5594 - valid acc= 76.03%
Epoch 63/80
train loss= 29.5120 - train acc= 91.04% - valid loss= 55.5406 - valid acc= 73.07%
Epoch 64/80
train loss= 27.0344 - train acc= 91.27% - valid loss= 56.6410 - valid acc= 73.39%
Epoch 65/80
train loss= 27.7033 - train acc= 92.32% - valid loss= 55.1492 - valid acc= 75.29%
Epoch 66/80
train loss= 27.2550 - train acc= 92.33% - valid loss= 56.0345 - valid acc= 73.92%
Epoch 67/80
train loss= 24.4578 - train acc= 92.22% - valid loss= 50.5115 - valid acc= 76.77%
Epoch 68/80
train loss= 28.4512 - train acc= 91.20% - valid loss= 58.2644 - valid acc= 75.61%
Epoch 69/80
train loss= 30.9235 - train acc= 89.86% - valid loss= 61.0017 - valid acc= 71.81%
Epoch 70/80
train loss= 29.8912 - train acc= 90.98% - valid loss= 58.1101 - valid acc= 75.40%
Epoch 71/80
train loss= 28.6128 - train acc= 91.20% - valid loss= 57.3786 - valid acc= 75.18%
Epoch 72/80
train loss= 27.9612 - train acc= 92.05% - valid loss= 54.5600 - valid acc= 74.34%
Epoch 73/80
train loss= 24.2196 - train acc= 93.31% - valid loss= 51.7744 - valid acc= 76.98%
Epoch 74/80
train loss= 26.4744 - train acc= 92.75% - valid loss= 57.8290 - valid acc= 74.97%
Epoch 75/80
train loss= 30.8405 - train acc= 91.50% - valid loss= 60.9488 - valid acc= 74.66%
Epoch 76/80
train loss= 24.7341 - train acc= 92.50% - valid loss= 53.5390 - valid acc= 73.71%
Epoch 77/80
train loss= 23.2458 - train acc= 92.26% - valid loss= 52.5142 - valid acc= 74.13%
Epoch 78/80
train loss= 29.0996 - train acc= 91.95% - valid loss= 59.7054 - valid acc= 73.81%
Epoch 79/80
train loss= 26.3712 - train acc= 91.13% - valid loss= 54.8839 - valid acc= 75.40%
Epoch 80/80
train loss= 23.8285 - train acc= 93.30% - valid loss= 53.3402 - valid acc= 75.82%

Question 3.6: Attack your best obtained model with PGD attacks with $\epsilon= 0.0313, k=20, \eta= 0.002$ on the testing set. Write the code for the attacks and report the robust accuracies. Also choose a random set of 20 clean images in the testing set and visualize the original and attacked images.

[4 points]
In [ ]:
#Your code here
import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt
from torchvision import transforms

def pgd_attack(model, input_image, input_label=None,
               epsilon=0.0313,  # Updated epsilon
               num_steps=20,    # Updated number of steps
               step_size=0.002, # Updated step size
               clip_value_min=0.,
               clip_value_max=1.0):
    if type(input_image) is np.ndarray:
        input_image = torch.tensor(input_image, dtype=torch.float32, requires_grad=True)

    if type(input_label) is np.ndarray:
        input_label = torch.tensor(input_label, dtype=torch.long)

    # Ensure the model is in evaluation mode
    model.eval()

    # Create a copy of the input image and set it to require gradients
    adv_image = input_image.clone().detach().requires_grad_(True)

    # Random initialization around input_image
    random_noise = torch.FloatTensor(input_image.shape).uniform_(-epsilon, epsilon)
    adv_image = adv_image + random_noise
    adv_image = torch.clamp(adv_image, clip_value_min, clip_value_max).detach().requires_grad_(True)

    # If no input label is provided, use the model's prediction
    if input_label is None:
        output = model(adv_image)
        input_label = torch.argmax(output, dim=1)

    # Perform PGD attack
    for _ in range(num_steps):
        adv_image.requires_grad_(True)
        output = model(adv_image)
        loss = nn.CrossEntropyLoss()(output, input_label)
        model.zero_grad()
        loss.backward()

        if adv_image.grad is not None:
            gradient = adv_image.grad.data
            adv_image = adv_image + step_size * gradient.sign()
            adv_image = torch.clamp(adv_image, input_image - epsilon, input_image + epsilon)
            adv_image = torch.clamp(adv_image, clip_value_min, clip_value_max).detach()
        else:
            print("Warning: Gradient is None. Check for detach operations.")

    return adv_image.detach()


epsilon = 0.0313
num_steps = 20
step_size = 0.002


# history = fit(model= my_cnn, train_loader=train_loader, valid_loader = val_loader, optimizer = optimizer, num_epochs= 1, verbose = False)

for (images, labels) in val_loader:
    images = images[:20]
    labels = labels[:20]

    model_images = images.clone()
    model_labels = my_cnn(images).argmax(dim=1)

    adv_images = pgd_attack(my_cnn, images, labels, epsilon=epsilon, num_steps=num_steps, step_size=step_size)
    adv_labels = my_cnn(adv_images).argmax(dim=1)

    print("Correct Labels")
    visualize_data(images, labels, images_per_row=10)

    print("Model Predictions")
    visualize_data(model_images, model_labels, images_per_row=10)

    print("Adversarial Examples")
    visualize_data(adv_images, adv_labels, images_per_row=10)

    adv_correct = 0
    for (correct, predicted) in zip(labels, adv_labels):
        if correct == predicted:
            adv_correct += 1

    print(f"Adversarial Image Accuracy: {adv_correct / len(labels) * 100:.2f}%")
    break
Correct Labels
No description has been provided for this image
Model Predictions
No description has been provided for this image
Adversarial Examples
No description has been provided for this image
Adversarial Image Accuracy: 0.00%

Question 3.7: Train a robust model using adversarial training with PGD ${\epsilon= 0.0313, k=10, \eta= 0.002}$. Write the code for the adversarial training and report the robust accuracies. After finishing the training, you need to store your best robust model in the folder ./models and load the model to evaluate the robust accuracies for PGD and FGSM attacks with $\epsilon= 0.0313, k=20, \eta= 0.002$ on the testing set.

[4 points]
In [ ]:
def fgsm_attack(model, input_image, input_label=None,
                epsilon=0.3,
                clip_value_min=0.,
                clip_value_max=1.0):

  if type(input_image) is np.ndarray:
      input_image = torch.tensor(input_image, requires_grad=False)

  if type(input_label) is np.ndarray:
      input_label = torch.tensor(input_label)

  # Ensure the model is in evaluation mode
  model.eval()
  # Create a copy of the input image and set it to require gradients
  adv_image = input_image.clone().detach().requires_grad_(True)  # Ensure requires_grad is True
  # Random initialization around input_image
  random_noise = torch.FloatTensor(input_image.shape).uniform_(-epsilon, epsilon)
  adv_image = adv_image + random_noise
  adv_image = torch.clamp(adv_image, clip_value_min, clip_value_max).detach().requires_grad_(True)
  output = model(adv_image)
  if input_label is not None:
      loss = F.cross_entropy(output, input_label)  # use ground-truth label to attack
  else:
      pred_label = output.argmax(dim=1)  # use predicted label to attack
      loss = F.cross_entropy(output, pred_label)

  model.zero_grad()
  loss.backward()
  if adv_image.grad is not None:
    gradient = adv_image.grad.data
    adv_image = input_image + epsilon * gradient.sign()
    adv_image = torch.clamp(adv_image, clip_value_min, clip_value_max)
  return adv_image.detach()
In [ ]:
#Your code here

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

def pgd_fit(model= None, train_loader = None, valid_loader= None, optimizer = None,
        num_epochs = 50, verbose = True, seed= 1234):
  torch.manual_seed(seed)
  # Move the model to the device before initializing the optimizer
  model.to(device) # Move the model to the GPU

  if optimizer == None:
    optim = torch.optim.Adam(model.parameters(), lr = 0.001) # Now initialize optimizer with model on GPU
  else:
    optim = optimizer
  history = dict()
  history['val_loss'] = list()
  history['val_acc'] = list()
  history['train_loss'] = list()
  history['train_acc'] = list()

  for epoch in range(num_epochs):
    model.train()
    for (X, y) in train_loader:
      # Move input data to the same device as the model
      X,y = X.to(device), y.to(device)
      adv_X = pgd_attack(model, X, y, epsilon=0.0313, num_steps=10, step_size=0.002)

      # Forward pass
      loss = (loss_fn(model(X.type(torch.float32)), y.type(torch.long)) + loss_fn(model(adv_X.type(torch.float32)), y.type(torch.long))) / 2
      # Backward and optimize
      optim.zero_grad()
      loss.backward()
      optim.step()


    #losses and accuracies for epoch
    val_loss = compute_loss(model, loss_fn, valid_loader)
    val_acc = compute_acc(model, valid_loader)
    train_loss = compute_loss(model, loss_fn, train_loader)
    train_acc = compute_acc(model, train_loader)
    history['val_loss'].append(val_loss)
    history['val_acc'].append(val_acc)
    history['train_loss'].append(train_loss)
    history['train_acc'].append(train_acc)
    if not verbose: #verbose = True means we do show the training information during training
      print(f"Epoch {epoch+1}/{num_epochs}")
      print(f"train loss= {train_loss:.4f} - train acc= {train_acc*100:.2f}% - valid loss= {val_loss:.4f} - valid acc= {val_acc*100:.2f}%")
  return history

# history = pgd_fit(model= my_cnn, train_loader=train_loader, valid_loader = val_loader, optimizer = optimizer, num_epochs= 20, verbose = True)

epsilon = 0.0313
num_steps = 20
step_size = 0.002

for (images, labels) in val_loader:
    images = images[:20]
    labels = labels[:20]

    model_images = images.clone()
    model_labels = my_cnn(images).argmax(dim=1)

    pgd_adv_images = pgd_attack(my_cnn, images, labels, epsilon=epsilon, num_steps=num_steps, step_size=step_size)
    pgd_adv_labels = my_cnn(adv_images).argmax(dim=1)

    fgsm_adv_images = fgsm_attack(my_cnn, images, labels, epsilon=epsilon)
    fgsm_adv_labels = my_cnn(adv_images).argmax(dim=1)

    print("Correct Labels")
    visualize_data(images, labels, images_per_row=10)

    print("Model Predictions")
    visualize_data(model_images, model_labels, images_per_row=10)

    print("PGD Adversarial Examples")
    visualize_data(pgd_adv_images, pgd_adv_labels, images_per_row=10)

    print("FGSM Adversarial Examples")
    visualize_data(fgsm_adv_images, fgsm_adv_labels, images_per_row=10)

    adv_correct = 0
    for (correct, predicted) in zip(labels, pgd_adv_labels):
        if correct == predicted:
            adv_correct += 1

    print(f"PGD Adversarial Image Accuracy: {adv_correct / len(labels) * 100:.2f}%")

    adv_correct = 0
    for (correct, predicted) in zip(labels, fgsm_adv_labels):
        if correct == predicted:
            adv_correct += 1

    print(f"FGSM Adversarial Image Accuracy: {adv_correct / len(labels) * 100:.2f}%")
    break
Correct Labels
No description has been provided for this image
Model Predictions
No description has been provided for this image
PGD Adversarial Examples
No description has been provided for this image
FGSM Adversarial Examples
No description has been provided for this image
PGD Adversarial Image Accuracy: 30.00%
FGSM Adversarial Image Accuracy: 30.00%

Question 3.8 (Kaggle competition)

[10 points]

You can reuse the best model obtained in this assignment or develop new models to evaluate on the testing set of the FIT3181/5215 Kaggle competion. However, to gain any points for this question, your testing accuracy must exceed the accuracy threshold from a base model developed by us as shown in the leader board of the competition.

The marks for this question are as follows:

  • If you are in top 10% of your cohort, you gain 10 points.
  • If you are in top 20% of your cohort, you gain 8 points.
  • If you are in top 30% of your cohort, you gain 6 points.
  • If you beat our base model, you gain 4 points.

END OF ASSIGNMENT
GOOD LUCK WITH YOUR ASSIGNMENT 1!